vpython - simple and easy VirtualEnv Python
vpython
is a tool, written in Go, which enables the simple and easy invocation
of Python code in VirtualEnv
environments.
vpython
is a simple Python bootstrap which (almost) transparently wraps a
Python interpreter invocation to run in a tailored VirtualEnv environment. The
environment is expressed by a script-specific configuration file. This allows
each Python script to trivially express its own package-level dependencies and
run in a hermetic world consisting of just those dependencies.
When invoking such a script via vpython
, the tool downloads its dependencies
and prepares an immutable VirtualEnv containing them. It then invokes the
script, now running in that VirutalEnv, through the preferred Python
interpreter.
vpython
does its best not to use hacky mechanisms to achieve this. It uses
an unmodified VirtualEnv package, standard setup methods, and local system
resources. The result is transparent canonical VirtualEnv environment
bootstrapping. vpython
is also safe for concurrent invocation, using safe
filesystem-level locking to perform any environment setup and management.
vpython
itself is very fast. The wheel downloads and VirtualEnvs may also be
cached and re-used, optimally limiting the runtime overhead of vpython
to just
one initial setup per unique environment.
Setup and Invocation
For the standard case, employing vpython
is as simple as:
First, create and upload Python wheels for all of the packages that you will
need. This is done in an implementation-specific way (e.g., upload wheels as
packages to CIPD).
Once the packages are available:
- Add
vpython
to PATH
.
- Write an environment specification naming packages.
- Change tool invocation from
python
to vpython
.
Using vpython
offers several benefits to direct Python invocation, especially
when vendoring packages. Notably, with vpython
:
- It is trivially enables hermetic Python everywhere.
- No
sys.path
manipulation is needed to load vendored or imported packages.
- Any tool can define which package(s) it needs without requiring coordination
or cooperation from other tools. (Note that the package must be made available
for download first).
- Adding new Python dependencies to a project is non-invasive and immediate.
- Package downloading and deployment are baked into
vpython
and built on
fast and secure Google Cloud Platform technologies.
- No more custom bootstraps. Several projects and tools, including multiple
within the infra code base, have bootstrap scripts that vendor packages or
mimic a VirtualEnv. These are at best repetitive and, at worst, buggy and
insecure.
- Depenencies are explicitly stated, not assumed.
Why VirtualEnv?
VirtualEnv offers several benefits over system Python. Primarily, it is the
By using the same environemnt everywhere, Python invocations become
reproducible. A tool run on a developer's system will load the same versions
of the same libraries as it will on a production system. A production system
will no longer fail because it is missing a package, or because it has the
wrong version.
A direct mechanism for vendoring, sys.path
manipulation, is nuanced, buggy,
and unsupported by the Python community. It is difficult to get right on all
platforms in all environments for all packages. A notorious example of this is
protobuf
and other domain-bound packages, which actively fight sys.path
inclusion. Using VirtualEnv means that any compliant Python package can
trivially be included into a project.
Why CIPD?
CIPD is a cross-platform
service and associated tooling and packages used to securely fetch and deploy
immutable "packages" (~= zip files) into the local file system. Unlike "package
managers" it avoids platform-specific assumptions, executable "hooks", or the
complexities of dependency resolution. vpython
uses this as a mechanism for
housing and deploying wheels.
infrastructure package deployment system. It is simple, accessible, fast, and
backed by resilient systems such as Google Storage and AppEngine.
Unlike pip
, a CIPD package is defined by its content, enabling precise package
matching instead of fuzzy version matching (e.g., numpy >= 1.2
, and
numpy == 1.2
both can match multiple numpy
packages in pip
).
CIPD also supports ACLs, enabling privileged Python projects to easily vendor
sensitive packages.
Why wheels?
A Python wheel is a simple binary
distrubition of Python code. A wheel can be generic (pure Python) or system-
and architecture-bound (e.g., 64-bit Mac OSX).
Wheels are prefered over eggs because they come packaged with compiled binaries.
This makes their deployment simple (unpack via pip
) and reduces system
requirements and variation, since local compilation is not needed.
The increased management burden of maintaining separate wheels for the same
package, one for each architecture, is handled naturally by CIPD, removing the
only real pain point.
Wheel Guidance
This section contains recommendations for building or uploading wheel CIPD
packages, including platform-specific guidance.
CIPD wheel packages are CIPD packages that contain Python wheels. A given CIPD
package can contain multiple wheels for multiple platforms, but should only
contain one version of any given package for any given architecture/platform.
For example, you can bundle a Windows, Linux, and Mac OSX version of numpy
and
coverage
in the same CIPD package, but you should not bundle numpy==1.11
and
numpy==1.12
in the same package.
The reason for this is that vpython
identifies which wheels to install by
scanning the contents of the CIPD package, and if multiple versions appear,
there is no clear guidance about which should be used.
Mac OSX
Use the m
ABI suffix and the macosx_...
platform. vpython
installs wheels
with the --force
flag, so slight binary incompatibilities (e.g., specific OSX
versions) can be glossed over.
coverage-4.3.4-cp27-cp27m-macosx_10_10_x86_64.whl
Linux
Use wheels with the mu
ABI suffix and the manylinux1
platform. For example:
coverage-4.3.4-cp27-cp27mu-manylinux1_x86_64.whl
Windows
Use wheels with the cp27m
or none
ABI tag. For example:
coverage-4.3.4-cp27-cp27m-win_amd64.whl
Setup and Invocation
vpython
can be invoked by replacing python
in the command-line with
vpython
.
vpython
works with a default Python environment out of the box. To add
vendored packges, you need to define an environment specification file that
describes which wheels to install.
An environment specification file is a text protobuf defined as Spec
here. An example is:
# Any 2.7 interpreter will do.
python_version: "2.7"
# Include "numpy" for the current architecture.
wheel {
name: "infra/python/wheels/numpy/${platform}-${arch}"
version: "version:1.11.0"
}
# Include "coverage" for the current architecture.
wheel {
name: "infra/python/wheels/coverage/${platform}-${arch}"
version: "version:4.1"
}
This specification can be supplied in one of three ways:
- Explicitly, as a command-line option to
vpython
(-spec
).
- Implicitly, as a file alongside your entry point. For example, if you are
running
test_runner.py
, vpython
will look for test_runner.py.vpython
next to it and load the environment from there.
- Implicitly, inined in your main file.
vpython
will scan the main entry point
for sentinel text and, if present, load the specification from that.
- Implicitly, through the
VPYTHON_VENV_SPEC_PATH
environment variable. This is
set by a vpython
invocation so that chained invocations default to the same
environment.
Optimization and Caching
vpython
has several levels of caching that it employs to optimize setup and
invocation overhead.
VirtualEnv
Once a VirtualEnv specification has been resolved, its resulting pinned
specification is hashed and used as a key to that VirtualEnv. Other vpython
invocations expressing hte same environment will naturally re-use that
VirtualEnv instead of creating their own.
Download Caching
Download mechanisms (e.g., CIPD) can optionally include a package cache to avoid
the overhead of downloading and/or resolving a package multiple times.
Migration
Command-line.
vpython
is a natural replacement for python
in the command line:
python ./foo/bar/baz.py -d --flag value arg arg whatever
Becomes:
vpython ./foo/bar/baz.py -d --flag value arg arg whatever
The vpython
tool accepts its own command-line arguments. In this case, use
a --
seprator to differentiate between vpython
options and python
options:
vpython -spec /path/to/spec.vpython -- ./foo/bar/baz.py
Shebang (POSIX)
If your script uses implicit specification (file or inline), replacing python
with vpython
in your shebang line will automatically work.
#!/usr/bin/env vpython