Documentation
¶
Overview ¶
Package lang implements the reflow language.
The reflow language is a simple, type-safe, applicative domain specific language used to construct Reflow flows.
A reflow expression is one of the following, where e1, e2, .. themselves represent expressions; id represents an identifier. Other literals are exemplars.
(e1) // parenthesization param("name", "help") // parameter definition let id = e1 in e2 // let-binding func(e1, e2) // function application (arbitrary arity) image(e1) // import docker image e1 intern(e1) // internalize data from url e1 (string or list of strings), see below groupby(e1, e2) // group the value e1 by the regular expression e2 concat(e1, e2) // concatenate the strings e1 and e2 map(e1, e2) // map the function e2 onto the list e1 collect(e1, e2) // filter out files in e1 that don't match the regexp in e2 collect(e1, e2, e3) // filter out files in e1 that don't match the regexp in e2; // then rewrite keys with replacement string e3 pullup(vs...) // flatten values into one e1 { bash script {{e2}} } // evaluate a bash script inside the image e1; // materialize the value e2 into its namespace and // substitute {{e2}} for its path e1["attr1=val1",..] // set attributes on images args // command line arguments (list of strings) "literal string" // a literal string
A reflow program comprises a number of toplevel definitions, each of which are one of:
include("path") // read definitions from the given file extern(e1, e2) // externalize e1 to url e2 id = e1 // bind e1 to identifier id func(id) = e1 // define a function where e1 is evaluated with // the bound value of id upon application.
For example, the following program produces a flow that will align a pair of FASTQ files.
// A Docker image that contains the BWA aligner. bwa = image("619867110810.dkr.ecr.us-west-2.amazonaws.com/wgsv1:latest") // A read pair stored on S3. r1 = intern("s3f://grail-marius/demultiplex2/W044216555475mini/FC0/W044216555475mini_S2_L001_R1_001.fastq.gz") r2 = intern("s3f://grail-marius/demultiplex2/W044216555475mini/FC0/W044216555475mini_S2_L001_R2_001.fastq.gz") // The BWA reference we'll be using. This fetches the entire s3 prefix, // which contains both the FASTA files as well as a BWA index. decoyAndViral = intern("s3://grail-scna/reference/bwa_decoy_viral_index") // Align a pair of fastq files using BWA. Outputs a BAM file. // We reserve approximately 12GB of memory for this operation. align(r1, r2) = bwa["rss=12000000000"] { /usr/local/bin/bwa mem {{decoyAndViral}}/decoy_and_viral.fa {{r1}} {{r2}} | \ /usr/local/bin/samtools view -Sb - > $out } // Upload the results of the expression "align(r1, r2)" to a file in S3. extern(align(r1, r2), "s3://grail-marius/aligned.bam")
Interns ¶
If function intern is handed a comma-separated list of arguments, it interns each separately and combines them into a single "virtual" value. The resulting output contains the union of all of the URLs, with the basename (directory name for directory interns, file names for file interns) appended to the keys of each respective intern. In this mode, directory interns must end in "/" so that the names are translated correctly. In the following example, "input" is a value containing INDEX and the contents of "s3://grail-marius/dir1" under the "dir1/" prefix.
input = intern("s3://grail-marius/dir1/,s3f://grail-marius/INDEX")
In this mode, empty list entries are ignored, thus adding a "," after a URL also hoists URLS into a directory. In the following example, reflow presents a directory with one file named "INDEX".
input = intern("s3f://grail-marius/INDEX,")
Type checking and evaluation ¶
Reflow programs are type checked by inference: reflow computes the type of each expressions and checks that it is subsequently used correctly.
Reflow types are one of:
string // the type of expressions producing strings num // the type of expressions producing numeric values flow // the type of expressions producing flows flowlist // the type of expressions producint lists of flows func(n, r) // the type of n-ary functions returning type r template // the type of command literals image // the type of expressions producing Docker image refs void // the type of side-effecting expressions
Here are some examples of expressions and their types:
"hello world" // string let h = "hello world" in h // string image("ubuntu") // image image("ubuntu") { echo hello world } // flow intern("s3://grail-marius/foobar") // flow extern(out, "s3://...") // void let h(a, b, c) = string in h // func(3, string)
The program is then evaluated into a Flow, which may in turn be evaluated on a computing cluster by the reflow evaluator.
Bugs and future work ¶
The language has many flaws and short-cuts. In particular, it is somewhat hamstrung by its static type checking discipline: for example, we currently restrict the type of function arguments so that they may be safely inferred without a more complicated type inferencing scheme.
We can get rid of this restriction while also retaining safety by more carefully staging reflow evaluation. Currently, a reflow program is evaluated into a flow, but the semantics of map demand that some evaluation is deferred (since we don't know its input beforehand). However, we can sever this tie by representing maps differently. Namely, they may evaluate to a flow where arguments are "holes", named by a de-Brujin index (so that maps may be nested safely). This evaluation scheme would permit the reflow language to use runtime typing while at the same time exposing errors before the (expensive) flow evaluation occurs.
The language also has several other problems and inconsistencies. First, it has shift-reduce conflicts, which we should seek to avoid. Second, it lacks some common features for which users compensate. For example, retaining filename information across groupby-map-merge operations is cumbersome. This can be addressed in future refinements of the language.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Error ¶
type Error struct { W io.Writer // The io.Writer to which errors are reported. N int // The error count. }
Error implements error reporting during parsing, typechecking, and evaluation. It is not safe for concurrent access.
type EvalEnv ¶
type EvalEnv struct { *Error // Def contains toplevel defs available in the environment. Def map[string]*Expr // Param returns the value of parameter id. // The second argument returned indicates whether the // parameter was defined. Param func(id, help string) (string, bool) // contains filtered or unexported fields }
EvalEnv contains the evaluation used by reflow. It contains a set of defs, params, and a value environment. It is an error reporter.
func (*EvalEnv) Push ¶
func (e *EvalEnv) Push()
Push pushes the current evaluation environment onto the stack.
type Expr ¶
Expr implements expressions in reflow. They contain the expression's op and arguments (left, right, list) as well as any literal values (ident, val).
func (*Expr) Eval ¶
Eval evaluates the expression e in the evaluation environment env. Eval assumes the expression has been typechecked; thus the expression is well-formed.
type Lexer ¶
type Lexer struct { // File is the filename reported by the lexer's position. File string // Body contains the text to be lexed. Body io.Reader // Mode specifies the Lexer mode. Mode LexerMode // HashVersion is the hash version string, if any. HashVersion string Expr *Expr Stmts []*Stmt // contains filtered or unexported fields }
Lexer is a lexer for reflow. Its tokens are defined in the reflow grammar. The lexer composes Go's text/scanner: it knows how to tokenize special identifiers, and performs semicolon insertion in the style of Go.
The lexer also manages include directives, which are implemented by recursively instantiating a lexer for the included file. (If we want to support dynamic inclusion, this mechanism would need to be moved to the evaluator.)
type Program ¶
type Program struct { // Errors is the writer to which errors are reported. Errors io.Writer // File is the name of the file containing the reflow program. File string // Args contains the command-line arguments (but not flags) // used for this program invocation. Args must be set before // calling Eval. Args []string // contains filtered or unexported fields }
Program represents a reflow program. It parses, typechecks, and evaluates reflow programs, managing parameters via Go's flags package.
func (*Program) Eval ¶
Eval evaluates the program and returns a flow. All toplevel extern statements are merged into a single flow.Merge node.
func (*Program) Flags ¶
Flags returns the set of flags that are defined by the program. It is defined only after ParseAndTypecheck has been called. Flags may be set to parameterize the program.
func (*Program) ModuleType ¶
ModuleType computes and returns the Reflow module type for this program. This is used for bridging "v0" scripts into "v1" modules. This should be called only after type checking has completed.
For simplicity we only export non-function values, since they always evaluate to either immediate values.T or else to Flows, both of which have defined digests. We don't let functions escape.
func (*Program) ModuleValue ¶
ModuleValue computes the Reflow module value given the set of defined parameters.
func (*Program) ParseAndTypecheck ¶
ParseAndTypecheck parses the program presented by the io.Reader r. It returns any error.
type Stmt ¶
Stmt implements a statement in reflow. It contains its operation and arguments (left, right, list).
type Type ¶
type Type int
Type is the type of types in the reflow language.
func (Type) ReflowType ¶
ReflowType converts a "v0" type to a Reflow type. A nil is returned if the type is not supported as a a Reflow type.
type TypeEnv ¶
type TypeEnv struct { *Error // The set of toplevel defs Def map[string]*Expr // contains filtered or unexported fields }
TypeEnv is a type environment used during typechecking and type inference. It is an error reporter.