Documentation ¶
Overview ¶
Package bq includes all code related to BigQuery.
NB: NOTES ON MEMORY USE AND HTTP SIZE The bigquery library uses JSON encoding of data, which appears to be the only option at this time. Furthermore, it uses intermediate data representations, eventually creating a map[string]Value (unless you pass that in to begin with). In general, when we start pumping large volumes of data, both the map and the JSON will cause some memory pressure, and likely pretty severe limits on the size of the insert we can send, likely on the order of a couple MB of actual row footprint in BQ.
Passing in slice of structs makes memory pressure a bit worse, but probably isn't worth worrying about.
Index ¶
- func GetClient(project string) (*bigquery.Client, error)
- func NewBQInserter(params etl.InserterParams, uploader etl.Uploader) (etl.Inserter, error)
- func NewColumnPartitionedInserter(pdt bqx.PDT) (row.Sink, error)
- func NewColumnPartitionedInserterWithUploader(pdt bqx.PDT, uploader etl.Uploader) (row.Sink, error)
- func NewInserter(dt etl.DataType, partition time.Time) (etl.Inserter, error)
- func NewSinkFactory() factory.SinkFactory
- type BQInserter
- func (in *BQInserter) Accepted() int
- func (in *BQInserter) Close() error
- func (in *BQInserter) Commit(rows []interface{}, label string) (int, error)
- func (in *BQInserter) Committed() int
- func (in *BQInserter) Dataset() string
- func (in *BQInserter) Failed() int
- func (in *BQInserter) Flush() error
- func (in *BQInserter) FullTableName() string
- func (in *BQInserter) InsertRow(data interface{}) error
- func (in *BQInserter) InsertRows(data []interface{}) error
- func (in *BQInserter) Project() string
- func (in *BQInserter) Put(rows []interface{}) error
- func (in *BQInserter) PutAsync(rows []interface{})
- func (in *BQInserter) RowsInBuffer() int
- func (in *BQInserter) TableBase() string
- func (in *BQInserter) TableSuffix() string
- type MapSaver
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewBQInserter ¶
NewBQInserter initializes a new BQInserter Pass in nil uploader for normal use, custom uploader for custom behavior TODO - improve the naming between here and NewInserter. TODO - migrate all the tests to use NewColumnPartitionedInserter.
func NewColumnPartitionedInserter ¶
NewColumnPartitionedInserter creates a new BQInserter for the specified BQ table using default bigquery uploader.
func NewColumnPartitionedInserterWithUploader ¶
NewColumnPartitionedInserterWithUploader creates a new BQInserter with appropriate characteristics. TODO - migrate all the tests to use this instead of NewBQInserter.
func NewInserter ¶
NewInserter creates a new BQInserter with appropriate characteristics.
func NewSinkFactory ¶
func NewSinkFactory() factory.SinkFactory
NewSinkFactory returns the default SinkFactory TODO inject a common bq client.
Types ¶
type BQInserter ¶
type BQInserter struct {
// contains filtered or unexported fields
}
BQInserter provides an API for inserting rows into a specific BQ Table.
func (*BQInserter) Accepted ¶
func (in *BQInserter) Accepted() int
func (*BQInserter) Close ¶
func (in *BQInserter) Close() error
Close synchronizes on the tokens, and closes the backing file.
func (*BQInserter) Commit ¶
Commit implements row.Sink. It is thread safe, and returns the number of rows successfull committed.
func (*BQInserter) Committed ¶
func (in *BQInserter) Committed() int
func (*BQInserter) Dataset ¶
func (in *BQInserter) Dataset() string
func (*BQInserter) Failed ¶
func (in *BQInserter) Failed() int
func (*BQInserter) Flush ¶
func (in *BQInserter) Flush() error
Flush synchronously flushes the rows in the row buffer up to BigQuery It is NOT threadsafe, as it touches the row buffer, so should only be called by the owning thread. Deprecated: Please use external buffer, Put, and PutAsync instead.
func (*BQInserter) FullTableName ¶
func (in *BQInserter) FullTableName() string
func (*BQInserter) InsertRow ¶
func (in *BQInserter) InsertRow(data interface{}) error
InsertRow adds one row to the insert buffer, and flushes if necessary. Caller should check error, and take appropriate action before calling again. NOT THREADSAFE. Should only be called by owning thread/goroutine. Deprecated: Please use external buffer, Put, and PutAsync instead.
func (*BQInserter) InsertRows ¶
func (in *BQInserter) InsertRows(data []interface{}) error
InsertRows adds rows to the insert buffer, and flushes if necessary. Caller should check error, and take appropriate action before calling again. NOT THREADSAFE. Should only be called by owning thread/goroutine. Deprecated: Please use external buffer, Put, and PutAsync instead.
func (*BQInserter) Project ¶
func (in *BQInserter) Project() string
func (*BQInserter) Put ¶
func (in *BQInserter) Put(rows []interface{}) error
Put sends a slice of rows to BigQuery, processes any errors, and updates row stats. It uses a token to serialize with any previous calls to PutAsync, to ensure that when Put() returns, all flushes have completed and row stats reflect PutAsync requests. (Of course races may occur if calls are made from multiple goroutines). It is THREAD-SAFE. It may block if there is already a Put or Flush in progress.
func (*BQInserter) PutAsync ¶
func (in *BQInserter) PutAsync(rows []interface{})
PutAsync asynchronously sends a slice of rows to BigQuery, processes any errors, and updates row stats. It uses a token to serialize with other (likely synchronous) calls, to ensure that when Put() returns, all flushes have completed and row stats reflect PutAsync requests. (Of course races may occur if these are called from multiple goroutines). It is THREAD-SAFE. It may block if there is already a Put or Flush in progress.
func (*BQInserter) RowsInBuffer ¶
func (in *BQInserter) RowsInBuffer() int
func (*BQInserter) TableBase ¶
func (in *BQInserter) TableBase() string