startf

package
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2021 License: GPL-3.0 Imports: 24 Imported by: 0

README

Qri Starlark Transformation Syntax

Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. Starlark is a scripting language from Google that feels a lot like python. This package implements starlark as a transformation syntax. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.

Typical examples of a starlark transformation include:

  • combining paginated calls to an API into a single dataset
  • downloading unstructured structured data from the internet to extract
  • pulling raw data off the web & turning it into a datset

We're excited about starlark for a few reasons:

  • python syntax - many people working in data science these days write python, we like that, starlark likes that. dope.
  • deterministic subset of python - unlike python, starlark removes properties that reduce introspection into code behaviour. things like while loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave.
  • parallel execution - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.

Getting started

If you're mainly interested in learning how to write starlark transformations, our documentation is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place!

The easiest way to see starlark transformations in action is to use qri. This startf package powers all the starlark stuff in qri. Assuming you have the go programming language the following should work from a terminal:

# get this package
$ go get github.com/qri-io/startf

# navigate to package
$ cd $GOPATH/src/github.com/qri-io/startf

run tests

$ go test ./...

Often the next steps are to install qri, mess with this startf package, then rebuild qri with your changes to see them in action within qri itself.

Starlark Special Functions

Special Functions are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant:

def transform(ds,ctx):
  ds.set_meta("hello","world")

Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body

def transform(ds, ctx):
  body = ds.get_body()
  if body != None:
    count = 0
    for entry in body:
      count += len(entry)
  ds.set_body([{"total": count}])

Starlark special functions have a few rules on top of starlark itself:

  • special functions always accept a transformation context (the ctx arg)
  • When you define a data function, qri calls it for you
  • All special functions are optional (you don't need to define them), except transform. transform is required.
  • Special functions are always called in the same order

Another import special function is download, which allows access to the http package:

load("http.star", "http")

def download(ctx):
  data = http.get("http://example.com/data.json")
  return data

The result of this special function can be accessed using ctx.download:

def transform(ds, ctx):
  ds.set_body(ctx.download)

More docs on the provide API is coming soon.

Running a transform

Let's say the above function is saved as transform.star. You can run it to create a new dataset by using:

qri save --file=transform.star me/dataset_name

Or, you can add more details by creating a dataset file (saved as dataset.yaml, for example) with additional structure:

name: dataset_name
transform:
  scriptpath: transform.star
meta:
  title: My awesome dataset

Then invoke qri:

qri save --file=dataset.yaml

Fun! More info over on our docs site


Documentation

Overview

Package startf implements dataset transformations using the starlark programming dialect For more info on starlark check github.com/google/starlark

Index

Constants

This section is empty.

Variables

View Source
var DefaultModuleLoader = func(thread *starlark.Thread, module string) (dict starlark.StringDict, err error) {
	return starlib.Loader(thread, module)
}

DefaultModuleLoader is the loader ExecScript will use unless configured otherwise

View Source
var ErrNotDefined = fmt.Errorf("not defined")

ErrNotDefined is for when a starlark value is not defined or does not exist

View Source
var (

	// ErrNtwkDisabled is returned whenever a network call is attempted but h.NetworkEnabled is false
	ErrNtwkDisabled = fmt.Errorf("network use is disabled. http can only be used during download step")
)
View Source
var Version = version.Version

Version is the version of qri that this transform was run with

Functions

func AddDatasetLoader

func AddDatasetLoader(loader dsref.Loader) func(o *ExecOpts)

AddDatasetLoader is required to enable the load_dataset starlark builtin

func AddEventsChannel

func AddEventsChannel(eventsCh chan event.Event) func(o *ExecOpts)

AddEventsChannel sets an event channel to send events on

func AddMutateFieldCheck

func AddMutateFieldCheck(check func(path ...string) error) func(o *ExecOpts)

AddMutateFieldCheck provides a checkFunc to ExecScript

func AddQriRepo

func AddQriRepo(repo repo.Repo) func(o *ExecOpts)

AddQriRepo adds a qri repo to execution options, providing scripted access to assets within the respoitory

func DefaultExecOpts

func DefaultExecOpts(o *ExecOpts)

DefaultExecOpts applies default options to an ExecOpts pointer

func Error

func Error(thread *starlark.Thread, _ *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error)

Error halts program execution with an error

func ExecScript

func ExecScript(ctx context.Context, next, prev *dataset.Dataset, opts ...func(o *ExecOpts)) error

ExecScript executes a transformation against a starlark script file. The next dataset pointer may be modified, while the prev dataset point is read-only. At a bare minimum this function will set transformation details, but starlark scripts can modify many parts of the dataset pointer, including meta, structure, and transform. opts may provide more ways for output to be produced from this function.

func MutatedComponentsFunc

func MutatedComponentsFunc(dsp *dataset.Dataset) func(path ...string) error

MutatedComponentsFunc returns a function for checking if a field has been modified. it's a kind of data structure mutual exclusion lock TODO (b5) - this should be refactored & expanded

func SetErrWriter

func SetErrWriter(w io.Writer) func(o *ExecOpts)

SetErrWriter provides a writer to record the "stderr" diagnostic output of the transform script

func SetSecrets

func SetSecrets(secrets map[string]string) func(o *ExecOpts)

SetSecrets assigns environment secret key-value pairs for script execution

Types

type EntryReader

type EntryReader struct {
	// contains filtered or unexported fields
}

EntryReader implements the dsio.EntryReader interface for starlark.Iterable's

func NewEntryReader

func NewEntryReader(st *dataset.Structure, iter starlark.Iterable) *EntryReader

NewEntryReader creates a new Entry Reader

func (*EntryReader) Close

func (r *EntryReader) Close() error

Close finalizes the reader

func (*EntryReader) ReadEntry

func (r *EntryReader) ReadEntry() (e dsio.Entry, err error)

ReadEntry reads one entry from the reader

func (*EntryReader) Structure

func (r *EntryReader) Structure() *dataset.Structure

Structure gives this reader's structure

type ExecOpts

type ExecOpts struct {
	// loader for loading datasets
	DatasetLoader dsref.Loader
	// supply a repo to make the 'qri' module available in starlark
	Repo repo.Repo
	// allow floating-point numbers
	AllowFloat bool
	// allow set data type
	AllowSet bool
	// allow lambda expressions
	AllowLambda bool
	// allow nested def statements
	AllowNestedDef bool
	// passed-in secrets (eg: API keys)
	Secrets map[string]interface{}
	// global values to pass for script execution
	Globals starlark.StringDict
	// func that errors if field specified by path is mutated
	MutateFieldCheck func(path ...string) error
	// provide a writer to record script "stderr" output to
	ErrWriter io.Writer
	// starlark module loader function
	ModuleLoader ModuleLoader
	// channel to send events on
	EventsCh chan event.Event
}

ExecOpts defines options for execution

type HTTPGuard

type HTTPGuard struct {
	NetworkEnabled bool
}

HTTPGuard protects network requests, only allowing when network is enabled

func (*HTTPGuard) Allowed

func (h *HTTPGuard) Allowed(req *http.Request) error

Allowed implements starlib/http RequestGuard

func (*HTTPGuard) DisableNtwk

func (h *HTTPGuard) DisableNtwk()

DisableNtwk prevents network calls from succeeding

func (*HTTPGuard) EnableNtwk

func (h *HTTPGuard) EnableNtwk()

EnableNtwk allows network calls

type ModuleLoader

type ModuleLoader func(thread *starlark.Thread, module string) (starlark.StringDict, error)

ModuleLoader is a function that can load starlark modules

type StepRunner

type StepRunner struct {
	// contains filtered or unexported fields
}

StepRunner is able to run individual transform steps

func NewStepRunner

func NewStepRunner(prev *dataset.Dataset, opts ...func(o *ExecOpts)) *StepRunner

NewStepRunner returns a new StepRunner for the given dataset

func (*StepRunner) LoadDatasetFunc

func (r *StepRunner) LoadDatasetFunc(ctx context.Context, target *dataset.Dataset) func(thread *starlark.Thread, fn *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error)

LoadDatasetFunc returns an implementation of the starlark load_dataset function

func (*StepRunner) RunStep

func (r *StepRunner) RunStep(ctx context.Context, ds *dataset.Dataset, st *dataset.TransformStep) error

RunStep runs the single transform step using the dataset

Directories

Path Synopsis
Package ds exposes the qri dataset document model into starlark Package ds defines the qri dataset object within starlark outline: ds ds defines the qri dataset object within starlark.
Package ds exposes the qri dataset document model into starlark Package ds defines the qri dataset object within starlark outline: ds ds defines the qri dataset object within starlark.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL