Documentation ¶
Overview ¶
Package graph is the internal representation of the Beam execution plan. This package is used by the public-facing Beam package to organize the user's pipeline into a connected graph structure. This graph is a precise, strongly-typed representation of the user's intent, and allows the runtime to verify typing of collections, and tracks the data dependency relationships to allow an optimizer to schedule the work.
Index ¶
- Constants
- Variables
- func Bounded(ns []*Node) bool
- func CoGBKMainInput(components int) func(*config)
- func IsLifecycleMethod(n string) bool
- func NewNamespaceGenerator(n int) func() string
- func NodeTypes(list []*Node) []typex.FullType
- func NumMainInputs(num mainInputs) func(*config)
- type CombineFn
- func (f *CombineFn) AddInputFn() *funcx.Fn
- func (f *CombineFn) CompactFn() *funcx.Fn
- func (f *CombineFn) CreateAccumulatorFn() *funcx.Fn
- func (f *CombineFn) ExtractOutputFn() *funcx.Fn
- func (f *CombineFn) MergeAccumulatorsFn() *funcx.Fn
- func (f *CombineFn) Name() string
- func (f *CombineFn) SetupFn() *funcx.Fn
- func (f *CombineFn) TeardownFn() *funcx.Fn
- type DoFn
- func (f *DoFn) Annotations() map[string][]byte
- func (f *DoFn) FinishBundleFn() *funcx.Fn
- func (f *DoFn) IsSplittable() bool
- func (f *DoFn) Name() string
- func (f *DoFn) OnTimerFn() (*funcx.Fn, bool)
- func (f *DoFn) PipelineState() []state.PipelineState
- func (f *DoFn) PipelineTimers() ([]timers.PipelineTimer, []string)
- func (f *DoFn) ProcessElementFn() *funcx.Fn
- func (f *DoFn) SetupFn() *funcx.Fn
- func (f *DoFn) StartBundleFn() *funcx.Fn
- func (f *DoFn) TeardownFn() *funcx.Fn
- type DynFn
- type ExpandedTransform
- type ExternalTransform
- type Fn
- type Graph
- func (g *Graph) Build() ([]*MultiEdge, []*Node, error)
- func (g *Graph) NewEdge(parent *Scope) *MultiEdge
- func (g *Graph) NewNode(t typex.FullType, w *window.WindowingStrategy, bounded bool) *Node
- func (g *Graph) NewScope(parent *Scope, name string) *Scope
- func (g *Graph) Root() *Scope
- func (g *Graph) String() string
- type Inbound
- type InputKind
- type MultiEdge
- func NewCoGBK(g *Graph, s *Scope, ns []*Node) (*MultiEdge, error)
- func NewCombine(g *Graph, s *Scope, u *CombineFn, in *Node, ac *coder.Coder, ...) (*MultiEdge, error)
- func NewCrossLanguage(g *Graph, s *Scope, ext *ExternalTransform, ins []*Inbound, outs []*Outbound) (*MultiEdge, func(*Node, bool))
- func NewExternal(g *Graph, s *Scope, payload *Payload, in []*Node, out []typex.FullType, ...) *MultiEdge
- func NewFlatten(g *Graph, s *Scope, in []*Node) (*MultiEdge, error)
- func NewImpulse(g *Graph, s *Scope, value []byte) *MultiEdge
- func NewParDo(g *Graph, s *Scope, u *DoFn, in []*Node, rc *coder.Coder, ...) (*MultiEdge, error)
- func NewReshuffle(g *Graph, s *Scope, in *Node) (*MultiEdge, error)
- func NewTaggedExternal(g *Graph, s *Scope, payload *Payload, ins []*Inbound, outs []*Outbound, ...) *MultiEdge
- func NewWindowInto(g *Graph, s *Scope, ws *window.WindowingStrategy, in *Node) *MultiEdge
- type Node
- type Opcode
- type Outbound
- type Payload
- type Scope
- type SplittableDoFn
- func (f *SplittableDoFn) CreateInitialRestrictionFn() *funcx.Fn
- func (f *SplittableDoFn) CreateTrackerFn() *funcx.Fn
- func (f *SplittableDoFn) CreateWatermarkEstimatorFn() *funcx.Fn
- func (f *SplittableDoFn) HasTruncateRestriction() bool
- func (f *SplittableDoFn) InitialWatermarkEstimatorStateFn() *funcx.Fn
- func (f *SplittableDoFn) IsStatefulWatermarkEstimating() bool
- func (f *SplittableDoFn) IsWatermarkEstimating() bool
- func (f *SplittableDoFn) Name() string
- func (f *SplittableDoFn) RestrictionSizeFn() *funcx.Fn
- func (f *SplittableDoFn) RestrictionT() reflect.Type
- func (f *SplittableDoFn) SplitRestrictionFn() *funcx.Fn
- func (f *SplittableDoFn) TruncateRestrictionFn() *funcx.Fn
- func (f *SplittableDoFn) WatermarkEstimatorStateFn() *funcx.Fn
- func (f *SplittableDoFn) WatermarkEstimatorStateT() reflect.Type
- func (f *SplittableDoFn) WatermarkEstimatorT() reflect.Type
Constants ¶
const ( MainUnknown mainInputs = -1 // Number of inputs is unknown for DoFn validation. MainSingle mainInputs = 1 // Number of inputs for single value elements. MainKv mainInputs = 2 // Number of inputs for KV elements. )
The following constants prefixed with "Main" represent valid numbers of DoFn main inputs for DoFn construction and validation.
const CombinePerKeyScope = "CombinePerKey"
CombinePerKeyScope is the Go SDK canonical name for the combine composite scope. With Beam Portability, "primitive" composite transforms like combine have their URNs & payloads attached to a high level scope, with a default representation beneath. The use of this const permits the translation layer to confirm the SDK expects this combine to be liftable by a runner and should set this scope's URN and Payload accordingly.
Variables ¶
var ( // UnnamedInputTag is a constant random string used when an ExternalTransform // expects a single unnamed input. xlangx and graphx use it to explicitly // bypass steps in pipeline construction meant for named inputs UnnamedInputTag string // UnnamedOutputTag is a constant random string used when an ExternalTransform // expects a single unnamed output. xlangx and graphx use it to explicitly // bypass steps in pipeline construction meant for named outputs. UnnamedOutputTag string // NewNamespace is a utility random string generator used by the xlang to // scope individual ExternalTransforms by a unique namespace NewNamespace func() string )
Functions ¶
func CoGBKMainInput ¶
func CoGBKMainInput(components int) func(*config)
CoGBKMainInput is an optional config to NewDoFn which specifies the number of components of a CoGBK input to the DoFn being created, allowing for more complete validation.
Example usage:
var col beam.PCollection graph.NewDoFn(fn, graph.CoGBKMainInput(len(col.Type().Components())))
func IsLifecycleMethod ¶
IsLifecycleMethod return true if the passed in string is one of the lifecycle method names used by the Go SDK as DoFn or CombineFn lifecycle methods. These are the only methods that need shims generated for them.
func NewNamespaceGenerator ¶
NewNamespaceGenerator returns a functions that generates a random string of n alphabets
func NumMainInputs ¶
func NumMainInputs(num mainInputs) func(*config)
NumMainInputs is an optional config to NewDoFn which specifies the number of main inputs to the DoFn being created, allowing for more complete validation. Valid inputs are the package constants of type mainInputs.
Example usage:
graph.NewDoFn(fn, graph.NumMainInputs(graph.MainKv))
Types ¶
type CombineFn ¶
type CombineFn Fn
CombineFn represents a CombineFn.
func AsCombineFn ¶
AsCombineFn converts a Fn to a CombineFn, if possible.
func NewCombineFn ¶
NewCombineFn constructs a CombineFn from the given value, if possible.
func (*CombineFn) AddInputFn ¶
AddInputFn returns the "AddInput" function, if present.
func (*CombineFn) CreateAccumulatorFn ¶
CreateAccumulatorFn returns the "CreateAccumulator" function, if present.
func (*CombineFn) ExtractOutputFn ¶
ExtractOutputFn returns the "ExtractOutput" function, if present.
func (*CombineFn) MergeAccumulatorsFn ¶
MergeAccumulatorsFn returns the "MergeAccumulators" function. If it is the only method present, then InputType == AccumulatorType == OutputType.
func (*CombineFn) TeardownFn ¶
TeardownFn returns the "Teardown" function, if present.
type DoFn ¶
type DoFn Fn
DoFn represents a DoFn.
func AsDoFn ¶
AsDoFn converts a Fn to a DoFn, if possible. numMainIn specifies how many main inputs are expected in the DoFn's method signatures. Valid inputs are the package constants of type mainInputs. If that number is MainUnknown then validation is done by best effort and may miss some edge cases.
func (*DoFn) Annotations ¶
Annotations returns the optional annotations of the DoFn, if present.
func (*DoFn) FinishBundleFn ¶
FinishBundleFn returns the "FinishBundle" function, if present.
func (*DoFn) IsSplittable ¶
IsSplittable returns whether the DoFn is a valid Splittable DoFn.
func (*DoFn) OnTimerFn ¶
OnTimerFn return the "OnTimer" function and a bool indicating whether the function is defined or not for the DoFn.
func (*DoFn) PipelineState ¶
func (f *DoFn) PipelineState() []state.PipelineState
PipelineState returns a list of PipelineState objects used to access/mutate global pipeline state (if any).
func (*DoFn) PipelineTimers ¶
func (f *DoFn) PipelineTimers() ([]timers.PipelineTimer, []string)
PipelineTimers returns the list of PipelineTimer objects defined for the DoFn.
func (*DoFn) ProcessElementFn ¶
ProcessElementFn returns the "ProcessElement" function.
func (*DoFn) StartBundleFn ¶
StartBundleFn returns the "StartBundle" function, if present.
func (*DoFn) TeardownFn ¶
TeardownFn returns the "Teardown" function, if present.
type DynFn ¶
type DynFn struct { // Name is the name of the function. It does not have to be a valid symbol. Name string // T is the type of the generated function T reflect.Type // Data holds the data, if any, for the generator. Each function // generator typically needs some configuration data, which is // required by the DynFn to be encoded. Data []byte // Gen is the function generator. The function generator itself must be a // function with a unique symbol. Gen func(string, reflect.Type, []byte) reflectx.Func }
DynFn is a generator for dynamically-created functions:
gen: (name string, t reflect.Type, []byte) -> func : T
where the generated function, fn : T, is re-created at runtime. This concept allows serialization of dynamically-generated functions, which do not have a valid (unique) symbol such as one created via reflect.MakeFunc.
type ExpandedTransform ¶
type ExpandedTransform struct { Components any // *pipepb.Components Transform any // *pipepb.PTransform Requirements []string }
ExpandedTransform stores the expansion response associated to each ExternalTransform.
Components and Transform fields are purposely typed as any to avoid unnecesary proto related imports into graph.
type ExternalTransform ¶
type ExternalTransform struct { Namespace string Urn string Payload []byte ExpansionAddr string InputsMap map[string]int OutputsMap map[string]int Expanded *ExpandedTransform }
ExternalTransform represents the cross-language transform in and out of pipeline graph. It is associated with each MultiEdge and it's Inbound and Outbound links. It also stores the associated expansion response within the Expanded field.
func (ExternalTransform) WithNamedInputs ¶
func (ext ExternalTransform) WithNamedInputs(inputsMap map[string]int) ExternalTransform
WithNamedInputs adds a map (tag -> index of Inbound in MultiEdge.Input) of named inputs corresponsing to ExternalTransform's InputsMap
func (ExternalTransform) WithNamedOutputs ¶
func (ext ExternalTransform) WithNamedOutputs(outputsMap map[string]int) ExternalTransform
WithNamedOutputs adds a map (tag -> index of Outbound in MultiEdge.Output) of named outputs corresponsing to ExternalTransform's OutputsMap
type Fn ¶
type Fn struct { // Fn holds the function, if present. If Fn is nil, Recv must be // non-nil. Fn *funcx.Fn // Recv hold the struct receiver, if present. If Recv is nil, Fn // must be non-nil. Recv any // DynFn holds the function-generator, if dynamic. If not nil, Fn // holds the generated function. DynFn *DynFn // contains filtered or unexported fields }
Fn holds either a function or struct receiver.
type Graph ¶
type Graph struct {
// contains filtered or unexported fields
}
Graph represents an in-progress deferred execution graph and is easily translatable to the model graph. This graph representation allows precise control over scope and connectivity.
func (*Graph) Build ¶
Build performs finalization on the graph. It verifies the correctness of the graph structure, typechecks the plan and returns a slice of the edges in the graph.
type Inbound ¶
type Inbound struct { // Kind presents the form of the data that the edge expects. Main input // must be processed element-wise, but side input may take several // convenient forms. For example, a DoFn that processes ints may choose // among the following parameter types: // // * Main: int // * Singleton: int // * Slice: []int // * Iter: func(*int) bool // * ReIter: func() func(*int) bool // // If the DoFn is generic then int may be replaced by any of the type // variables. For example, // // * Slice: []typex.T // * Iter: func(*typex.X) bool // // If the input type is KV<int,string>, say, then the options are: // // * Main: int, string (as two separate parameters) // * Map: map[int]string // * MultiMap: map[int][]string // * Iter: func(*int, *string) bool // * ReIter: func() func(*int, *string) bool // // As above, either int, string, or both can be replaced with type // variables. For example, // // * Map: map[typex.X]typex.Y // * MultiMap: map[typex.T][]string // * Iter: func(*typex.Z, *typex.Z) bool // // Note that in the last case the parameter type requires that both // the key and value types are identical. Bind enforces such constraints. Kind InputKind // From is the incoming node in the graph. From *Node // Type is the fulltype matching the actual type used by the transform. // Due to the loose signatures of DoFns, we can only determine the // inbound structure when the fulltypes of the incoming links are present. // For example, // // func (ctx context.Context, key int, value typex.X) error // // is a generic DoFn that if bound to KV<int,string> would have one // Inbound link with type KV<int, X>. Type typex.FullType }
Inbound represents an inbound data link from a Node.
func NamedInboundLinks ¶
NamedInboundLinks returns an array of new Inbound links and a map (tag -> index of Inbound in MultiEdge.Input) of corresponding indices with respect to their names.
type InputKind ¶
type InputKind string
InputKind represents the role of the input and its shape.
const ( Main InputKind = "Main" Singleton InputKind = "Singleton" Slice InputKind = "Slice" Map InputKind = "Map" // TODO: allow? MultiMap InputKind = "MultiMap" // TODO: allow? Iter InputKind = "Iter" ReIter InputKind = "ReIter" )
Valid input kinds.
func Bind ¶
func Bind(fn *funcx.Fn, typedefs map[string]reflect.Type, in ...typex.FullType) ([]typex.FullType, []InputKind, []typex.FullType, []typex.FullType, error)
Bind returns the inbound, outbound and underlying output types for a Fn, when bound to the underlying input types. The complication of bind is primarily that UserFns have loose signatures and bind must produce valid type information for the execution plan.
For example,
func (t EventTime, k typex.X, v int, emit func(string, typex.X))
or
func (context.Context, k typex.X, v int) (string, typex.X, error)
are UserFns that may take one or two incoming fulltypes: either KV<X,int> or X with a singleton side input of type int. For the purpose of the shape of data processing, the two forms are equivalent. The non-data types, context.Context and error, are not part of the data signature, but in play only at runtime.
If either was bound to the input type [KV<string,int>], bind would return:
inbound: [Main: KV<X,int>] outbound: [KV<string,X>] output: [KV<string,string>]
Note that it propagates the assignment of X to string in the output type.
If either was instead bound to the input fulltypes [float, int], the result would be:
inbound: [Main: X, Singleton: int] outbound: [KV<string,X>] output: [KV<string, float>]
Here, the inbound shape and output types are different from before.
type MultiEdge ¶
type MultiEdge struct { Op Opcode DoFn *DoFn // ParDo RestrictionCoder *coder.Coder // SplittableParDo StateCoders map[string]*coder.Coder // Stateful ParDo CombineFn *CombineFn // Combine AccumCoder *coder.Coder // Combine Value []byte // Impulse External *ExternalTransform // Current External Transforms API Payload *Payload // Legacy External Transforms API WindowFn *window.Fn // WindowInto Input []*Inbound Output []*Outbound // contains filtered or unexported fields }
MultiEdge represents a primitive data processing operation. Each non-user code operation may be implemented by either the harness or the runner.
func NewCombine ¶
func NewCombine(g *Graph, s *Scope, u *CombineFn, in *Node, ac *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)
NewCombine inserts a new Combine edge into the graph. Combines cannot have side input.
func NewCrossLanguage ¶
func NewCrossLanguage(g *Graph, s *Scope, ext *ExternalTransform, ins []*Inbound, outs []*Outbound) (*MultiEdge, func(*Node, bool))
NewCrossLanguage inserts a Cross-langugae External transform using initialized input and output nodes
func NewExternal ¶
func NewExternal(g *Graph, s *Scope, payload *Payload, in []*Node, out []typex.FullType, bounded bool) *MultiEdge
NewExternal inserts an External transform. The system makes no assumptions about what this transform might do.
func NewFlatten ¶
NewFlatten inserts a new Flatten edge in the graph. Flatten output type is the shared input type.
func NewImpulse ¶
NewImpulse inserts a new Impulse edge into the graph. It must use the built-in bytes coder.
func NewParDo ¶
func NewParDo(g *Graph, s *Scope, u *DoFn, in []*Node, rc *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)
NewParDo inserts a new ParDo edge into the graph.
func NewReshuffle ¶
NewReshuffle inserts a new Reshuffle edge into the graph.
func NewTaggedExternal ¶
func NewTaggedExternal(g *Graph, s *Scope, payload *Payload, ins []*Inbound, outs []*Outbound, bounded bool) *MultiEdge
NewTaggedExternal inserts an External transform with tagged inbound and outbound connections. The system makes no assumptions about what this transform might do.
func NewWindowInto ¶
NewWindowInto inserts a new WindowInto edge into the graph.
type Node ¶
type Node struct { // Coder defines the data encoding. It can be changed, but must be of // the underlying type, t. Coder *coder.Coder // contains filtered or unexported fields }
Node is a typed connector describing the data type and encoding. A node may have multiple inbound and outbound connections. The underlying type must be a complete type, i.e., not include any type variables.
func (*Node) WindowingStrategy ¶
func (n *Node) WindowingStrategy() *window.WindowingStrategy
WindowingStrategy returns the window applied to the data.
type Outbound ¶
type Outbound struct { // To is the outgoing node in the graph. To *Node // Type is the fulltype matching the actual type used by the transform. // For DoFns, unlike inbound, the outbound types closely mimic the type // signature. For example, // // func (ctx context.Context, emit func (key int, value typex.X)) error // // is a generic DoFn that produces one Outbound link of type KV<int,X>. Type typex.FullType // representation type of data }
Outbound represents an outbound data link to a Node.
func NamedOutboundLinks ¶
NamedOutboundLinks returns an array of new Outbound links and a map (tag -> index of Outbound in MultiEdge.Output) of corresponding indices with respect to their names.
type Payload ¶
type Payload struct { URN string Data []byte // Optional fields mapping tags to inputs. If present, will override // the default IO tagging for the transform's input PCollections. InputsMap map[string]int // Optional fields mapping tags to outputs. If present, will override // the default IO tagging for the transform's output PCollections. OutputsMap map[string]int }
Payload represents an external payload.
type Scope ¶
type Scope struct { // Label is the human-visible label for this scope. Label string // Parent is the parent scope, if nested. Parent *Scope // Context contains optional metadata associated with this scope. Context context.Context // contains filtered or unexported fields }
Scope is a syntactic Scope, such as arising from a composite Transform. It has no semantic meaning at execution time. Used by monitoring.
type SplittableDoFn ¶
type SplittableDoFn DoFn
SplittableDoFn represents a DoFn implementing SDF methods.
func (*SplittableDoFn) CreateInitialRestrictionFn ¶
func (f *SplittableDoFn) CreateInitialRestrictionFn() *funcx.Fn
CreateInitialRestrictionFn returns the "CreateInitialRestriction" function, if present.
func (*SplittableDoFn) CreateTrackerFn ¶
func (f *SplittableDoFn) CreateTrackerFn() *funcx.Fn
CreateTrackerFn returns the "CreateTracker" function, if present.
func (*SplittableDoFn) CreateWatermarkEstimatorFn ¶
func (f *SplittableDoFn) CreateWatermarkEstimatorFn() *funcx.Fn
CreateWatermarkEstimatorFn returns the "createWatermarkEstimator" function, if present
func (*SplittableDoFn) HasTruncateRestriction ¶
func (f *SplittableDoFn) HasTruncateRestriction() bool
HasTruncateRestriction returns whether the DoFn implements a custom truncate restriction function.
func (*SplittableDoFn) InitialWatermarkEstimatorStateFn ¶
func (f *SplittableDoFn) InitialWatermarkEstimatorStateFn() *funcx.Fn
InitialWatermarkEstimatorStateFn returns the "InitialWatermarkEstimatorState" function, if present
func (*SplittableDoFn) IsStatefulWatermarkEstimating ¶
func (f *SplittableDoFn) IsStatefulWatermarkEstimating() bool
IsStatefulWatermarkEstimating returns whether the DoFn implements custom watermark state.
func (*SplittableDoFn) IsWatermarkEstimating ¶
func (f *SplittableDoFn) IsWatermarkEstimating() bool
IsWatermarkEstimating returns whether the DoFn implements a custom watermark estimator.
func (*SplittableDoFn) Name ¶
func (f *SplittableDoFn) Name() string
Name returns the name of the function or struct.
func (*SplittableDoFn) RestrictionSizeFn ¶
func (f *SplittableDoFn) RestrictionSizeFn() *funcx.Fn
RestrictionSizeFn returns the "RestrictionSize" function, if present.
func (*SplittableDoFn) RestrictionT ¶
func (f *SplittableDoFn) RestrictionT() reflect.Type
RestrictionT returns the restriction type from the SDF.
func (*SplittableDoFn) SplitRestrictionFn ¶
func (f *SplittableDoFn) SplitRestrictionFn() *funcx.Fn
SplitRestrictionFn returns the "SplitRestriction" function, if present.
func (*SplittableDoFn) TruncateRestrictionFn ¶
func (f *SplittableDoFn) TruncateRestrictionFn() *funcx.Fn
TruncateRestrictionFn returns the "TruncateRestriction" function, if present.
func (*SplittableDoFn) WatermarkEstimatorStateFn ¶
func (f *SplittableDoFn) WatermarkEstimatorStateFn() *funcx.Fn
WatermarkEstimatorStateFn returns the "WatermarkEstimatorState" function, if present
func (*SplittableDoFn) WatermarkEstimatorStateT ¶
func (f *SplittableDoFn) WatermarkEstimatorStateT() reflect.Type
WatermarkEstimatorStateT returns the type of the watermark estimator state from the SDF
func (*SplittableDoFn) WatermarkEstimatorT ¶
func (f *SplittableDoFn) WatermarkEstimatorT() reflect.Type
WatermarkEstimatorT returns the type of the watermark estimator from the SDF
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package coder contains coder representation and utilities.
|
Package coder contains coder representation and utilities. |
testutil
Package testutil contains helpers to test and validate custom Beam Schema coders.
|
Package testutil contains helpers to test and validate custom Beam Schema coders. |
Package mtime contains a millisecond representation of time.
|
Package mtime contains a millisecond representation of time. |
Package window contains window representation, windowing strategies and utilities.
|
Package window contains window representation, windowing strategies and utilities. |
trigger
Package trigger helps construct aggregation triggers with beam.WindowInto.
|
Package trigger helps construct aggregation triggers with beam.WindowInto. |