Documentation ¶
Overview ¶
Package bq implements the export of datastore objects to bigquery.
See go/swarming/bq for more information.
Index ¶
Constants ¶
const ( // TaskRequests exports model.TaskRequest to the `task_requests` table. TaskRequests = "task_requests" // BotEvents exports model.BotEvent to the `bot_events` table. BotEvents = "bot_events" // TaskRunResults exports model.TaskRunResults to the `task_results_run` // table. TaskRunResults = "task_results_run" // TaskResultSummaries exports model.TaskResultSummaries to the // `task_results_summary` table. TaskResultSummaries = "task_results_summary" )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type AbstractFetcher ¶
type AbstractFetcher interface { // Descriptor is the proto descriptor of the BQ row protobuf message. Descriptor() *descriptorpb.DescriptorProto // Fetch fetches entities, converts them to BQ rows and flushes the result. Fetch(ctx context.Context, start time.Time, duration time.Duration, flushers []*Flusher) error }
AbstractFetcher is implemented by all Fetcher[E, R].
func BotEventsFetcher ¶
func BotEventsFetcher() AbstractFetcher
BotEventsFetcher makes a fetcher that can produce BotEvent BQ rows.
func TaskRequestFetcher ¶
func TaskRequestFetcher() AbstractFetcher
TaskRequestFetcher makes a fetcher that can produce TaskRequest BQ rows.
func TaskResultSummariesFetcher ¶
func TaskResultSummariesFetcher() AbstractFetcher
TaskResultSummariesFetcher makes a fetcher that can produce TaskResult BQ rows.
func TaskRunResultsFetcher ¶
func TaskRunResultsFetcher() AbstractFetcher
TaskRunResultsFetcher makes a fetcher that can produce TaskResult BQ rows.
type ExportOp ¶
type ExportOp struct { BQClient *managedwriter.Client // the BigQuery client to use PSClient *pubsub.PublisherClient // the PubSub client to use OperationID string // ID of this particular export operation TableID string // full table name to write results into Topic string // if set, export rows to this PubSub topic Fetcher AbstractFetcher // fetches data and coverts it to [][]byte // contains filtered or unexported fields }
ExportOp can fetch data from datastore and write it to BigQuery and PubSub.
It can fail with a transient error and be retried. On a retry it will attempt to finish the previous export (if possible).
May publish duplicate rows to PubSub on retries, since PubSub doesn't support committed writes in the same way BigQuery does.
type ExportSchedule ¶
type ExportSchedule struct { // ID is the BQ table name where events are exported to. // // It is one of the constants defined above, e.g. TaskRequests. ID string `gae:"$id"` // NextExport is a timestamp to start exporting events from. // // It is updated by scheduleExportTasks once it schedules TQ tasks that do // actual export. NextExport time.Time `gae:",noindex"` // contains filtered or unexported fields }
ExportSchedule stores the per-table timestamp to start exporting events from.
type ExportState ¶
type ExportState struct { // ID is taskspb.ExportInterval.OperationId of the corresponding export task. // // This ID is derived based on the destination table name and the time // interval being exported. ID string `gae:"$id"` // WriteStreamName is name of the BigQuery WriteStream being exported to. // // Used when restarting the export task after transient errors to check if // the stream is committed. WriteStreamName string `gae:",noindex"` // ExpireAt is a timestamp when this entity can be deleted. // // Used in the Cloud Datastore TTL policy. ExpireAt time.Time `gae:",noindex"` // contains filtered or unexported fields }
ExportState stores the BQ WriteStream associated with an export task.
It exists only if the write stream is already finalize and either already committed or is about to be committed. The purpose of this entity is to make sure that if an export task operation is retried, it doesn't accidentally export the same data again.
type Fetcher ¶
Fetcher knows how to fetch Datastore data that should be exported to BQ.
`E` is the entity struct to fetch (e.g. model.TaskResultSummary) and `R` is the proto message representing the row (e.g. *bqpb.TaskResult).
func (*Fetcher[E, R]) Descriptor ¶
func (f *Fetcher[E, R]) Descriptor() *descriptorpb.DescriptorProto
Descriptor is the proto descriptor of the row protobuf message.
func (*Fetcher[E, R]) Fetch ¶
func (f *Fetcher[E, R]) Fetch(ctx context.Context, start time.Time, duration time.Duration, flushers []*Flusher) error
Fetch fetches entities, converts them to BQ rows and flushes the result.
Visits entities in range `[start, start+duration)`. Calls `flush` callback when it wants to send a batch of rows to BQ.
type Flusher ¶
type Flusher struct { // CountThreshold is how many rows to buffer before flushing them. CountThreshold int // ByteThreshold is how many bytes to buffer before flushing them. ByteThreshold int // Marshal converts a row proto to a raw byte message. Marshal func(proto.Message) ([]byte, error) // Flush flushes a batch of converted messages. Flush func([][]byte) error // contains filtered or unexported fields }
Flusher knows how to serialize rows and flush a bunch of them.