bq

package
v0.0.0-...-a70aae3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2024 License: Apache-2.0 Imports: 42 Imported by: 0

Documentation

Overview

Package bq implements the export of datastore objects to bigquery.

See go/swarming/bq for more information.

Index

Constants

View Source
const (
	// TaskRequests exports model.TaskRequest to the `task_requests` table.
	TaskRequests = "task_requests"

	// BotEvents exports model.BotEvent to the `bot_events` table.
	BotEvents = "bot_events"

	// TaskRunResults exports model.TaskRunResults to the `task_results_run`
	// table.
	TaskRunResults = "task_results_run"

	// TaskResultSummaries exports model.TaskResultSummaries to the
	// `task_results_summary` table.
	TaskResultSummaries = "task_results_summary"
)

Variables

This section is empty.

Functions

func Register

func Register(
	srv *server.Server,
	disp *tq.Dispatcher,
	cron *cron.Dispatcher,
	dataset, onlyOneTable string,
	exportToPubSub stringset.Set,
	pubSubTopicPrefix string,
) error

Register registers TQ tasks and cron handlers that implement BigQuery export.

Types

type AbstractFetcher

type AbstractFetcher interface {
	// Descriptor is the proto descriptor of the BQ row protobuf message.
	Descriptor() *descriptorpb.DescriptorProto
	// Fetch fetches entities, converts them to BQ rows and flushes the result.
	Fetch(ctx context.Context, start time.Time, duration time.Duration, flushers []*Flusher) error
}

AbstractFetcher is implemented by all Fetcher[E, R].

func BotEventsFetcher

func BotEventsFetcher() AbstractFetcher

BotEventsFetcher makes a fetcher that can produce BotEvent BQ rows.

func TaskRequestFetcher

func TaskRequestFetcher() AbstractFetcher

TaskRequestFetcher makes a fetcher that can produce TaskRequest BQ rows.

func TaskResultSummariesFetcher

func TaskResultSummariesFetcher() AbstractFetcher

TaskResultSummariesFetcher makes a fetcher that can produce TaskResult BQ rows.

func TaskRunResultsFetcher

func TaskRunResultsFetcher() AbstractFetcher

TaskRunResultsFetcher makes a fetcher that can produce TaskResult BQ rows.

type ExportOp

type ExportOp struct {
	BQClient    *managedwriter.Client   // the BigQuery client to use
	PSClient    *pubsub.PublisherClient // the PubSub client to use
	OperationID string                  // ID of this particular export operation
	TableID     string                  // full table name to write results into
	Topic       string                  // if set, export rows to this PubSub topic
	Fetcher     AbstractFetcher         // fetches data and coverts it to [][]byte
	// contains filtered or unexported fields
}

ExportOp can fetch data from datastore and write it to BigQuery and PubSub.

It can fail with a transient error and be retried. On a retry it will attempt to finish the previous export (if possible).

May publish duplicate rows to PubSub on retries, since PubSub doesn't support committed writes in the same way BigQuery does.

func (*ExportOp) Close

func (p *ExportOp) Close(ctx context.Context)

Close cleans up resources.

func (*ExportOp) Execute

func (p *ExportOp) Execute(ctx context.Context, start time.Time, duration time.Duration) error

Execute performs the export operation.

type ExportSchedule

type ExportSchedule struct {

	// ID is the BQ table name where events are exported to.
	//
	// It is one of the constants defined above, e.g. TaskRequests.
	ID string `gae:"$id"`

	// NextExport is a timestamp to start exporting events from.
	//
	// It is updated by scheduleExportTasks once it schedules TQ tasks that do
	// actual export.
	NextExport time.Time `gae:",noindex"`
	// contains filtered or unexported fields
}

ExportSchedule stores the per-table timestamp to start exporting events from.

type ExportState

type ExportState struct {

	// ID is taskspb.ExportInterval.OperationId of the corresponding export task.
	//
	// This ID is derived based on the destination table name and the time
	// interval being exported.
	ID string `gae:"$id"`

	// WriteStreamName is name of the BigQuery WriteStream being exported to.
	//
	// Used when restarting the export task after transient errors to check if
	// the stream is committed.
	WriteStreamName string `gae:",noindex"`

	// ExpireAt is a timestamp when this entity can be deleted.
	//
	// Used in the Cloud Datastore TTL policy.
	ExpireAt time.Time `gae:",noindex"`
	// contains filtered or unexported fields
}

ExportState stores the BQ WriteStream associated with an export task.

It exists only if the write stream is already finalize and either already committed or is about to be committed. The purpose of this entity is to make sure that if an export task operation is retried, it doesn't accidentally export the same data again.

type Fetcher

type Fetcher[E any, R proto.Message] struct {
	// contains filtered or unexported fields
}

Fetcher knows how to fetch Datastore data that should be exported to BQ.

`E` is the entity struct to fetch (e.g. model.TaskResultSummary) and `R` is the proto message representing the row (e.g. *bqpb.TaskResult).

func (*Fetcher[E, R]) Descriptor

func (f *Fetcher[E, R]) Descriptor() *descriptorpb.DescriptorProto

Descriptor is the proto descriptor of the row protobuf message.

func (*Fetcher[E, R]) Fetch

func (f *Fetcher[E, R]) Fetch(ctx context.Context, start time.Time, duration time.Duration, flushers []*Flusher) error

Fetch fetches entities, converts them to BQ rows and flushes the result.

Visits entities in range `[start, start+duration)`. Calls `flush` callback when it wants to send a batch of rows to BQ.

type Flusher

type Flusher struct {
	// CountThreshold is how many rows to buffer before flushing them.
	CountThreshold int
	// ByteThreshold is how many bytes to buffer before flushing them.
	ByteThreshold int
	// Marshal converts a row proto to a raw byte message.
	Marshal func(proto.Message) ([]byte, error)
	// Flush flushes a batch of converted messages.
	Flush func([][]byte) error
	// contains filtered or unexported fields
}

Flusher knows how to serialize rows and flush a bunch of them.

Directories

Path Synopsis
Package taskspb contains proto definitions for tq.Tasks payloads used in swarming-go.
Package taskspb contains proto definitions for tq.Tasks payloads used in swarming-go.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL