datacatalog

package
v0.1.15 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2019 License: Apache-2.0 Imports: 22 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GenerateArtifactTagName

func GenerateArtifactTagName(ctx context.Context, inputs *core.LiteralMap) (string, error)

Generate a tag by hashing the input values

func GenerateDatasetIDForTask

func GenerateDatasetIDForTask(ctx context.Context, k catalog.Key) (*datacatalog.DatasetID, error)

Get the DataSetID for a task. NOTE: the version of the task is a combination of both the discoverable_version and the task signature. This is because the interfact may of changed even if the discoverable_version hadn't.

func GenerateTaskOutputsFromArtifact

func GenerateTaskOutputsFromArtifact(id core.Identifier, taskInterface core.TypedInterface, artifact *datacatalog.Artifact) (*core.LiteralMap, error)

Transform the artifact Data into task execution outputs as a literal map

Types

type CatalogClient

type CatalogClient struct {
	// contains filtered or unexported fields
}

This is the client that caches task executions to DataCatalog service.

func NewDataCatalog

func NewDataCatalog(ctx context.Context, endpoint string, insecureConnection bool) (*CatalogClient, error)

Create a new Datacatalog client for task execution caching

func (*CatalogClient) CreateArtifact

func (m *CatalogClient) CreateArtifact(ctx context.Context, datasetID *datacatalog.DatasetID, outputs *core.LiteralMap, md *datacatalog.Metadata) (*datacatalog.Artifact, error)

func (*CatalogClient) CreateDataset

func (m *CatalogClient) CreateDataset(ctx context.Context, key catalog.Key, metadata *datacatalog.Metadata) (*datacatalog.DatasetID, error)

func (*CatalogClient) Get

Get the cached task execution from Catalog. These are the steps taken: - Verify there is a Dataset created for the Task - Lookup the Artifact that is tagged with the hash of the input values - The artifactData contains the literal values that serve as the task outputs

func (*CatalogClient) GetArtifactByTag

func (m *CatalogClient) GetArtifactByTag(ctx context.Context, tagName string, dataset *datacatalog.Dataset) (*datacatalog.Artifact, error)

Helper method to retrieve an artifact by the tag

func (*CatalogClient) GetDataset

func (m *CatalogClient) GetDataset(ctx context.Context, key catalog.Key) (*datacatalog.Dataset, error)

Helper method to retrieve a dataset that is associated with the task

func (*CatalogClient) Put

func (m *CatalogClient) Put(ctx context.Context, key catalog.Key, reader io.OutputReader, metadata catalog.Metadata) error

Catalog the task execution as a cached Artifact. We associate an Artifact as the cached data by tagging the Artifact with the hash of the input values.

The steps taken to cache an execution: - Ensure a Dataset exists for the Artifact. The Dataset represents the proj/domain/name/version of the task - Create an Artifact with the execution data that belongs to the dataset - Tag the Artifact with a hash generated by the input values

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL