pipeline

package

v0.8.3 Latest Latest Go to latest Published: Jun 23, 2015 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tadhunt/pachyderm

Links

Open Source Insights

Documentation ¶

Overview ¶

package pipeline implements a system for running data pipelines on top of the filesystem

Index ¶

Variables
func RunPipelines(pipelineDir, inRepo, outRepo, commit, branch, shard string) error
func WaitPipeline(pipelineDir, pipeline, commit string) error
type Pipeline
- func NewPipeline(name, dataRepo, outRepo, commit, branch, shard string) *Pipeline
type Runner
- func NewRunner(pipelineDir, inRepo, outPrefix, commit, branch, shard string) *Runner
- func (r *Runner) Cancel() error
- func (r *Runner) Run() error

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrFailed        = errors.New("pfs: pipeline failed")
	ErrCancelled     = errors.New("pfs: cancelled")
	ErrArgCount      = errors.New("pfs: illegal argument count")
	ErrUnkownKeyword = errors.New("pfs: unknown keyword")
)

Functions ¶

func RunPipelines ¶

func RunPipelines(pipelineDir, inRepo, outRepo, commit, branch, shard string) error

RunPipelines lets you easily run the Pipelines in one line if you don't care about cancelling them.

func WaitPipeline ¶ added in v0.8.3

func WaitPipeline(pipelineDir, pipeline, commit string) error

WaitPipeline waits for a pipeline to complete. If the pipeline fails ErrFailed is returned.

Types ¶

type Pipeline ¶

type Pipeline struct {
	// contains filtered or unexported fields
}

func NewPipeline ¶

func NewPipeline(name, dataRepo, outRepo, commit, branch, shard string) *Pipeline

func (*Pipeline) Cancel ¶

func (p *Pipeline) Cancel() error

Cancel stops a pipeline by force before it's finished

func (*Pipeline) Fail ¶ added in v0.8.3

func (p *Pipeline) Fail() error

func (*Pipeline) Finish ¶

func (p *Pipeline) Finish() error

Finish makes the final commit for the pipeline

func (*Pipeline) Image ¶

func (p *Pipeline) Image(image string) error

Image sets the image that is being used for computations.

func (*Pipeline) Input ¶

func (p *Pipeline) Input(name string) error

Import makes a dataset available for computations in the container.

func (*Pipeline) Run ¶

func (p *Pipeline) Run(cmd []string) error

Run runs a command in the container, it assumes that `branch` has already been created. Notice that any failure in this function leads to the branch having uncommitted dirty changes. This state needs to be cleaned up before the pipeline is rerun. The reason we don't do it here is that even if we try our best the process crashing at the wrong time could still leave it in an inconsistent state.

func (*Pipeline) RunPachFile ¶

func (p *Pipeline) RunPachFile(r io.Reader) error

func (*Pipeline) Shuffle ¶

func (p *Pipeline) Shuffle(dir string) error

Shuffle rehashes an output directory. If 2 shards each have a copy of the file `foo` with the content: `bar`. Then after shuffling 1 of those nodes will have a file `foo` with content `barbar` and the other will have no file foo.

func (*Pipeline) Start ¶

func (p *Pipeline) Start() error

Start gets an outRepo ready to be used. This is where clean up of dirty state from a crash happens.

type Runner ¶

type Runner struct {
	// contains filtered or unexported fields
}

func NewRunner ¶

func NewRunner(pipelineDir, inRepo, outPrefix, commit, branch, shard string) *Runner

func (*Runner) Cancel ¶

func (r *Runner) Cancel() error

func (*Runner) Run ¶

func (r *Runner) Run() error

Run runs all of the pipelines it finds in pipelineDir. Returns the first error it encounters.

Source Files ¶

View all Source files

pipeline.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL