Documentation ¶
Overview ¶
Package dataflowlib translates a Beam pipeline model to the Dataflow API job model, for submission to Google Cloud Dataflow.
Index ¶
- func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, ...) (*dataflowPipelineResult, error)
- func Fixup(p *pipepb.Pipeline) (*pipepb.Pipeline, error)
- func FromMetricUpdates(allMetrics []*df.MetricUpdate, job *df.Job) *metrics.Results
- func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)
- func NewClient(ctx context.Context, endpoint string) (*df.Service, error)
- func PrintJob(ctx context.Context, job *df.Job)
- func StageFile(ctx context.Context, project, url, filename string) error
- func StageModel(ctx context.Context, project, modelURL string, model []byte) error
- func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job) (*df.Job, error)
- func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, ...) (*df.Job, error)
- func WaitForCompletion(ctx context.Context, client *df.Service, project, region, jobID string) error
- type JobOptions
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Execute ¶
func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (*dataflowPipelineResult, error)
Execute submits a pipeline as a Dataflow job.
func FromMetricUpdates ¶
FromMetricUpdates extracts metrics from a slice of MetricUpdate objects and groups them into counters, distributions and gauges.
Dataflow currently only reports Counter and Distribution metrics to Cloud Monitoring. Gauge metrics are not supported. The output metrics.Results will not contain any gauges.
func GetMetrics ¶
func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)
GetMetrics returns a collection of metrics describing the progress of a job by making a call to Cloud Monitoring service.
func NewClient ¶
NewClient creates a new dataflow client with default application credentials and CloudPlatformScope. The Dataflow endpoint is optionally overridden.
func StageModel ¶
StageModel uploads the pipeline model to GCS as a unique object.
func Submit ¶
func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job) (*df.Job, error)
Submit submits a prepared job to Cloud Dataflow.
Types ¶
type JobOptions ¶
type JobOptions struct { // Name is the job name. Name string // Experiments are additional experiments. Experiments []string // Pipeline options Options runtime.RawOptions Project string Region string Zone string Network string Subnetwork string NoUsePublicIPs bool NumWorkers int64 DiskSizeGb int64 MachineType string Labels map[string]string ServiceAccountEmail string WorkerRegion string WorkerZone string // Autoscaling settings Algorithm string MaxNumWorkers int64 TempLocation string // Worker is the worker binary override. Worker string // WorkerJar is a custom worker jar. WorkerJar string TeardownPolicy string }
JobOptions capture the various options for submitting jobs to Dataflow.