Documentation ¶
Overview ¶
Package script implements scripting support for defining Diviner studies through Starlark [1]. This package enables the floating point and lambda extensions of the language.
Script defines the following builtins for defining Diviner configurations (question marks indicate optional arguments):
discrete(v1, v2, v3...) Defines a discrete parameter that takes on the provided set set of values (types string, float, or int). range(beg, end) Defines a range parameter with the given range. (Integers or floats.) minimize(metric) Defines an objective that minimizes a metric (string). maximize(metric) Defines an objective that maximizes a metric (string). localsystem(name, parallelism?) Defines a new local system with the provided name. The name is used to identify the system in tools. The parallelism limits the number of jobs that run on this system simultaneously. If parallelism is unset, it defaults to ∞. ec2system(name, ami, instance_profile, instance_type, disk_space?, data_space?, on_demand?, flavor?) Defines a new EC2-based system of the given name, and configuration. The provided name is used to identify the system in tools. - ami: the EC2 AMI to use when launching new instances; - instance_profile: the IAM instance profile assigned to new instances; - instance_type: the instance type used; - disk_space: the amount of root disk space created; - data_space: the amount of data/scratch space created; - on_demand: (bool) whether to launch on-demand instance types; - flavor: the flavor of AMI: "ubuntu" or "coreos". See package github.com/grailbio/bigmachine/ec2system for more details on these parameters. dataset(name, system, if_not_exist?, local_files?, script) Defines a dataset (diviner.Dataset): - name: the name of the dataset, which must be unique; - system: the system(s) to be used for run execution. The value is either a single system or a list of systems. In the latter case, the run will use any one of systems can allocate resources. - if_not_exist: a URL that is checked for conditional execution; dataset invocations are de-duped based on this URL. - local_files: a list of local files that must be made available in the script's execution environment; - script: the script that is run to produce the dataset. run_config(script, system, local_files?, datasets?) Defines a run config (diviner.RunConfig) representing a single trial: - script: the script that is executed for this trial; - system: the system(s) to be used for run execution. The value is either a single system or a list of systems. In the latter case, the run will use any one of systems can allocate resources. - local_files: a list of local files that must be made available in the script's execution environment; - datasets: a list of datasets that must be available before the trial can proceed. study(name, params, objective, run, replicates?, oracle?) A toplevel function that declares a named study with the provided parameters, runner, and objectives. - name: a string specifying the name of the study; - objective: the optimization objective; - params: a dictionary with naming a set of parameters to be optimized; - run: a function that returns a run_config for a set of parameter values; the first argument to the function is a dictionary of parameter values. A number of optional, named arguments follow: "id" is a string providing the run's diviner ID, which may be used as an external key to reference a particular run; "replicate" is an integer specifying the replicate number associated with the run. - replicates: the number of replicates to perform for each parameter combination. - description:an optional string describing the study. - oracle: the oracle to use (grid search by default). grid_search The grid search oracle skopt(base_estimator?, n_initial_points?, acq_func?, acq_optimizer?) A Bayesian optimization oracle based on skopt. The arguments are as in skopt.Optimizer, documented at https://scikit-optimize.github.io/optimizer/index.html#skopt.optimizer.Optimizer: - base_estimator: the base estimator to be used, one of "GP", "RF", "ET", "GBRT" (default "GP"); - n_initial_points: number of evaluations to perform before estimating using the above estimator (default 10); - acq_func: the acquisition function to use for sampling new points, one of "LCB", "EI", "PI", or "gp_hedge" (default "gp_hedge"); - acq_optimizer: the optimizer used to minimize the acquisitino function, one of "sampling", "lgbfs" (by default it is automatically selected). command(script, interpreter?="bash -c", strip?=False) Run a subprocess and return its standard output as a string. - script: the script to run; a string. - interpreter: command that runs the script. It defaults to "bash -c". - strip: strip leading and training whitespace from the command's output. For example, command("print('foo'*2)", interpreter="python3 -c") will produce "foofoo\n". temp_file(contents) Create a temporary file from the provided contents (a string), and return its path. enum_value(str) Internal representation of a protocol buffer enumeration value. (See to_proto). to_proto(dict): Render a string-keyed dictionary to the text protocol buffer format. Dictionaries cannot currently be nested. Enumeration values as created by enum_value are rendered as protocol buffer enumeration, not strings. panic(messages...) Print the messages and crash the process.
Diviner configs must include one or more studies as toplevel declarations. Global starlark objects are frozen after initial evaluation to prevent functions from modifying shared state.
[1] https://docs.bazel.build/versions/master/skylark/language.html
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.