skytf

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 5, 2018 License: BSD-3-Clause Imports: 16 Imported by: 0

README

Qri GoDoc License Codecov CI Go Report Card

Qri Skylark Transformation Syntax

Qri ("query") is about datasets. Transformions are repeatable scripts for generating a dataset. Skylark is a scripting langauge from Google that feels a lot like python. This package implements skylark as a transformation syntax. Skylark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.

Typical examples of a skylark transformation include:

  • combining paginated calls to an API into a single dataset
  • downloading unstructured structured data from the internet to extract
  • re-shaping raw input data before saving a dataset

We're excited about skylark for a few reasons:

  • python syntax - many people working in data science these days write python, we like that, skylark likes that. dope.
  • deterministic subset of python - unlike python, skylark removes properties that reduce introspection into code behaviour. things like while loops and recursive functions are ommitted, making it possible for qri to infer how a given transformation will behave.
  • parallel execution - thanks to this deterministic requirement (and lack of global interpreter lock) skylark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.

Getting started

If you're mainly interested in learning how to write skylark transformations, our documentation is a better place to start. If you're interested in contributing to the way skylark transformations work, this is the place!

The easiest way to see skylark transformations in action is to use qri. This skytf package powers all the skylark stuff in qri. Assuming you have the go programming language the following should work from a terminal:

# get this package
$ go get github.com/qri-io/skytf

# navigate to package
$ cd $GOPATH/src/github.com/qri-io/skytf

# run tests
$ go test ./...

Often the next steps are to install qri, mess with this skytf package, then rebuild qri with your changes to see them in action within qri itself.

Skylark Data Functions

Data Functions are the core of a skylark transform script. Here's an example of a simple data function that returns a constant result:

def transform(qri):
  return ["hello","world"]

Here's something slightly more complicated that modifies a previous dataset by adding up the length of all of the elements:

def transform(qri):
  body = qri.get_body()
  count = 0
  for entry in body:
    count += len(entry)
  return [{"total": count}]

Skylark transformations have a few rules on top of skylark itself:

  • Data functions always return an array or dictionary/object, representing the new dataset body
  • When you define a data function, qri calls it for you
  • All transform functions are optional (you don't need to define them), but
  • A transformation must have at least one data function
  • Data functions are always called in the same order
  • Data functions often get a qri parameter that lets them do special things

More docs on the provide API is coming soon.

Running a transform

Let's say the above function is saved as transform.sky. First, create a configuration file (saved as config.yaml, for example) with at least the minimal structure:

name: dataset_name
transform:
  scriptpath: transform.sky
  config:
    org: qri-io
    repo: frontend

Then invoke qri:

qri update --file=config.yaml me/dataset_name

If the script uses qri.get_body, there must be an existing version of the dataset already. Otherwise, if the dataset doesn't exist yet, and is being created from some other source, use qri add instead.


Documentation

Overview

Package skytf implements dataset transformations using the skylark programming dialect For more info on skylark check github.com/google/skylark

Index

Constants

View Source
const Version = "0.1.0"

Version is the current version of this skytf, this version number will be written with each transformation exectution

Variables

View Source
var ErrNotDefined = fmt.Errorf("not defined")

ErrNotDefined is for when a skylark value is not defined or does not exist

View Source
var (

	// ErrNtwkDisabled is returned whenever a network call is attempted but h.NetworkEnabled is false
	ErrNtwkDisabled = fmt.Errorf("network use is disabled. http can only be used during download step")
)

Functions

func AddQriNodeOpt added in v0.1.0

func AddQriNodeOpt(node *p2p.QriNode) func(o *ExecOpts)

AddQriNodeOpt adds a qri node to execution options

func DefaultExecOpts

func DefaultExecOpts(o *ExecOpts)

DefaultExecOpts applies default options to an ExecOpts pointer

func Error

func Error(thread *skylark.Thread, _ *skylark.Builtin, args skylark.Tuple, kwargs []skylark.Tuple) (skylark.Value, error)

Error halts program execution with an error

func ExecFile

func ExecFile(ds *dataset.Dataset, filename string, bodyFile cafs.File, opts ...func(o *ExecOpts)) (cafs.File, error)

ExecFile executes a transformation against a skylark file located at filepath, giving back an EntryReader of resulting data ExecFile modifies the given dataset pointer. At bare minimum it will set transformation details, but skylark scripts can modify many parts of the dataset pointer, including meta, structure, and transform

Types

type EntryReader

type EntryReader struct {
	// contains filtered or unexported fields
}

EntryReader implements the dsio.EntryReader interface for skylark.Iterable's

func NewEntryReader

func NewEntryReader(st *dataset.Structure, iter skylark.Iterable) *EntryReader

NewEntryReader creates a new Entry Reader

func (*EntryReader) ReadEntry

func (r *EntryReader) ReadEntry() (e dsio.Entry, err error)

ReadEntry reads one entry from the reader

func (*EntryReader) Structure

func (r *EntryReader) Structure() *dataset.Structure

Structure gives this reader's structure

type ExecOpts

type ExecOpts struct {
	Node           *p2p.QriNode
	AllowFloat     bool                   // allow floating-point numbers
	AllowSet       bool                   // allow set data type
	AllowLambda    bool                   // allow lambda expressions
	AllowNestedDef bool                   // allow nested def statements
	Secrets        map[string]interface{} // passed-in secrets (eg: API keys)
	Globals        skylark.StringDict
}

ExecOpts defines options for execution

type HTTPGuard added in v0.1.0

type HTTPGuard struct {
	NetworkEnabled bool
}

HTTPGuard protects network requests, only allowing when network is enabled

func (*HTTPGuard) Allowed added in v0.1.0

func (h *HTTPGuard) Allowed(req *http.Request) error

Allowed implements starlib/http RequestGuard

func (*HTTPGuard) DisableNtwk added in v0.1.0

func (h *HTTPGuard) DisableNtwk()

DisableNtwk prevents network calls from succeeding

func (*HTTPGuard) EnableNtwk added in v0.1.0

func (h *HTTPGuard) EnableNtwk()

EnableNtwk allows network calls

Directories

Path Synopsis
Package ds exposes the qri dataset document model into skylark
Package ds exposes the qri dataset document model into skylark

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL