Documentation ¶
Overview ¶
Package bq is a library for working with BigQuery.
Limits ¶
Please see BigQuery docs: https://cloud.google.com/bigquery/quotas#streaminginserts for the most updated limits for streaming inserts. It is expected that the client is responsible for ensuring their usage will not exceed these limits through bq usage. A note on maximum rows per request: Put() batches rows per request, ensuring that no more than 10,000 rows are sent per request, and allowing for custom batch size. BigQuery recommends using 500 as a practical limit (so we use this as a default), and experimenting with your specific schema and data sizes to determine the batch size with the ideal balance of throughput and latency for your use case.
Authentication ¶
Authentication for the Cloud projects happens during client creation: https://godoc.org/cloud.google.com/go#pkg-examples. What form this takes depends on the application.
Monitoring ¶
You can use tsmon (https://godoc.org/go.chromium.org/luci/common/tsmon) to track upload latency and errors.
If Uploader.UploadsMetricName field is not zero, Uploader will create a counter metric to track successes and failures.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type InsertIDGenerator ¶
type InsertIDGenerator struct { // Counter is an atomically-managed counter used to differentiate Insert // IDs produced by the same process. Counter int64 // Prefix should be able to uniquely identify this specific process, // to differentiate Insert IDs produced by different processes. // // If empty, prefix will be derived from system and process specific // properties. Prefix string }
InsertIDGenerator generates unique Insert IDs.
BigQuery uses Insert IDs to deduplicate rows in the streaming insert buffer. The association between Insert ID and row persists only for the time the row is in the buffer.
InsertIDGenerator is safe for concurrent use.
var ID InsertIDGenerator
ID is the global InsertIDGenerator
func (*InsertIDGenerator) Generate ¶
func (id *InsertIDGenerator) Generate() string
Generate returns a unique Insert ID.
type Row ¶
type Row struct { proto.Message // embedded // InsertID is unique per insert operation to handle deduplication. InsertID string }
Row implements bigquery.ValueSaver
type Uploader ¶
type Uploader struct { *bigquery.Uploader // Uploader is bound to a specific table. DatasetID and Table ID are // provided for reference. DatasetID string TableID string // UploadsMetricName is a string used to create a tsmon Counter metric // for event upload attempts via Put, e.g. // "/chrome/infra/commit_queue/events/count". If unset, no metric will // be created. UploadsMetricName string // BatchSize is the max number of rows to send to BigQuery at a time. // The default is 500. BatchSize int // contains filtered or unexported fields }
Uploader contains the necessary data for streaming data to BigQuery.
func NewUploader ¶
NewUploader constructs a new Uploader struct.
DatasetID and TableID are provided to the BigQuery client to gain access to a particular table.
You may want to change the default configuration of the bigquery.Uploader. Check the documentation for more details.
Set UploadsMetricName on the resulting Uploader to use the default counter metric.
Set BatchSize to set a custom batch size.
func (*Uploader) Put ¶
Put uploads one or more rows to the BigQuery service. Put takes care of adding InsertIDs, used by BigQuery to deduplicate rows.
If any rows do now match one of the expected types, Put will not attempt to upload any rows and returns an InvalidTypeError.
Put returns a PutMultiError if one or more rows failed to be uploaded. The PutMultiError contains a RowInsertionError for each failed row.
Put will retry on temporary errors. If the error persists, the call will run indefinitely. Because of this, if ctx does not have a timeout, Put will add one.
See bigquery documentation and source code for detailed information on how struct values are mapped to rows.