Superpose
Superpose is a library for creating Go compiler wrappers/plugins that support transforming packages in other
"dimensions" and making them callable from the original package.
Quick start example from the example/mocktime README, can build the tool:
go build ./example/mocktime/superpose-mocktime
Then can be executed as toolexec
with the just-built executable:
go run -toolexec /path/to/superpose-mocktime ./example/mocktime
Note how there are log statements outside the dimension with normal timestamps and inside the dimension with
mocked timestamps. This is because time.Now()
is altered in the separate dimension and therefore all things that
reference time
in that dimension are altered too, e.g. the log
package.
WARNING: This library is an intentionally-untagged proof of concept with no guarantees on future maintenance. Many
advanced uses may not be supported.
Contents
Overview
This library leverages the -toolexec
option of go
build
/run
/test
to intercept compilation and allow
transforming certain packages in a separate dimension that are compiled alongside the untransformed code. Then a
"bridge" method can call into the other dimension. Developers simply have to write the transformer and most details
concerning caching, building, and other nuances are taken care of.
The uses of this are the same as any other compile-time transformer. Potential uses:
- Mocking things like current time and time movement
- Compile-time macros and code generation
- Aspect oriented use cases like injecting log info
- "Sandboxing" and other runtime call restrictions (albeit not secure)
- External manipulation of third party or standard library packages
- Tooling support (e.g. how
-race
, code coverage, go:embed
, etc can work)
Granted, as with all tools like this and especially in the Go ecosystem, compile-time code transformation should be the
last resort. It should only be used when it's really needed. It can also be a bit unwieldy for the compiling developer
as they have to opt-in with a special argument.
Examples
- example/logger - Shows replacing standard library code by replacing "Hello" with "Aloha" in all logs
when running under the other dimension. Also shows a test case.
- example/maporder - More advanced example showing how to have deterministic map iteration
- example/mocktime - Shows a basic way to replace
time.Now()
for a mock clock
See the README in each example for how to run it.
Usage
For a basically-unusable simple example, let's say we want to change every function named "ReturnString" in our package
to return "foo".
Terms in common use:
- Bridge Function - An exported function in a file that can be used as a "bridge" call to another dimension. When a
var
in the same file is present with a type of func
that is the exact signature of the exported function and a
comment in the form of //my-dimension:MyFuncName
, it is set by the Superpose compiler to that same function in the
other dimension.
- Dimension - A string name of a "dimension" that a transformer applies to. All packages, including applicable
dependency packages, that are transformed for a dimension are put in mangled package paths to differentiate themselves
from the un-transformed code.
- In-var - A
bool
var
with a comment in the form of //my-dimension:<in>
that Superpose sets to true
when
compiled in that dimension (but remains false in all other places including normal code).
- Transformer - Code for a dimension that says which packages are applied to the dimension and provides patches to files
inside that dimension.
We must first create a transformer. This is the main executable that is used as Go's toolexec
, meaning it is invoked
for every Go build/compile/link/etc command. The transformer applies to a certain dimension and set of packages.
Assuming we want to use dimension name my-dimension
, here's how it might look:
package main
import (
"context"
"go/ast"
"strings"
"github.com/cretz/superpose"
)
func main() {
superpose.RunMain(
context.Background(),
superpose.Config{
// We use the current content ID of the executable of our version which
// adds a slight performance penalty
Version: superpose.MustLoadCurrentExeContentID(),
Transformers: map[string]superpose.Transformer{"my-dimension": transformer{}},
// This is very noisy if verbose by default. Consider only setting this as
// true during development.
Verbose: true,
},
superpose.RunMainConfig{},
)
}
type transformer struct{}
func (transformer) AppliesToPackage(ctx *superpose.TransformContext, pkgPath string) (bool, error) {
return strings.HasPrefix(pkgPath, "example.com/mymodule"), nil
}
func (transformer) Transform(
ctx *superpose.TransformContext,
pkg *superpose.TransformPackage,
) (*superpose.TransformResult, error) {
// Change any ReturnString function to return "foo"
res := &superpose.TransformResult{
// We set this to true so we can make sure our patched file appears like it
// was named the original file name
AddLineDirectives: true,
// If verbose is on, this will log the entirety of every patched file, which
// we want during development
LogPatchedFiles: true,
}
// Go over each file in the package
for _, file := range pkg.Syntax {
for _, decl := range file.Decls {
// Add patch if it's the func we want
decl, _ := decl.(*ast.FuncDecl)
if decl == nil || decl.Name.Name != "ReturnString" {
continue
}
res.Patches = append(res.Patches, &superpose.Patch{
// We're replacing from just after opening brace to just before closing
// brace
Range: superpose.Range{Pos: decl.Body.Lbrace + 1, End: decl.Body.Rbrace},
// In addition to our return statement, we also want to set a line
// directive before the closing brace to what it was before so all other
// line numbers of the file still read the same
Str: fmt.Sprintf(
` return "foo" /*line :%v*/`,
pkg.Fset.Position(decl.Body.Rbrace).Line,
),
})
}
}
return res, nil
}
In any package underneath example.com/mymodule
that has a ReturnString
top-level function, we will change it to just
return "foo"
. A more advanced example would have done some type checking to confirm the function looked right, but
this is a simplified example.
Note how we built a patch and set AddLineDirectives: true
and added /*line :<line>*/
to our patch. Superpose works
on patches instead of AST alterations. This is important to retain line information. When we may alter line counts but
we want to appear in stack traces and debugger as the original line, we need AddLineDirectives: true
to fix the
filename, and then we need to set line directives for the
compiler.
Once that transformer is built as an executable, we can now use it in -toolexec
. -toolexec
build flag is accepted in
all go
calls that may build, e.g. go build
, go run
, go test
, etc. So if we had a user_code.go
file, we
could:
go run -toolexec /path/to/my-transformer user_code.go
There is a caveat however for build tags. Go does not provide toolexec
executables a way to know what build tags are
in use by itself and dependencies. Therefore, if we set -tags
on the go
command, we have to set -buildtags
for the
toolexec
. For example:
go run -tags mytag -toolexec "/path/to/my-transformer -buildtags mytag" user_code.go
This ensures build tags are respected when building the other dimensions.
Referencing another dimension
Now that we have a transformer for a dimension and know how to build with it, we need to be able to call into the
dimension. Say we have this file at example.com/mymodule/otherpkg/return_string.go
:
package otherpkg
func ReturnString() string { return "original string" }
Now say we want to call otherpkg
in the other dimension. If we just call otherpkg.ReturnString()
we'll get
"original string"
. To call the other dimension we have to make a "bridge function".
A bridge function is an exported function in a file accompanied by a var
of that exact function signature, including
parameter/return var names, in that same file. The var has a special comment in the form //my-dimension:MyFunc
that
tells the Superpose compiler that it should be set with the same function from that dimension. The package for the file
containing this bridge function must also return true
for Transformer.AppliesToPackage
for that dimension.
Here's an example, say at file example.com/mymodule/cmd/main.go
, that has a bridge function to the my-dimension
dimension:
package main
import (
"fmt"
"example.com/mymodule/otherpkg"
)
func CallReturnString() string { return otherpkg.ReturnString() }
var CallReturnStringInMyDimension func() string //my-dimension:CallReturnString
func main() {
fmt.Printf("Normal code: %v\n", CallReturnString())
fmt.Printf("Other dimension code: %v\n", CallReturnStringInMyDimension())
}
Running:
go run -toolexec /path/to/my-transformer ./cmd
Will output:
Normal code: original string
Other dimension code: foo
Bridge functions do not have to be in the main package. Any number of bridge functions can be defined. Since
package-level vars are different in different dimensions, it may make sense to have a bridge function reference/mutate
them. Note, types from a transformed package can't be used as parameter/return to the bridge function because it will
appear as another type in the bridge and a compile error will occur.
Knowing we're in a dimension
Sometimes in transformed code we need to know whether we're running in a dimension or not. This can be done with a
"in-var" which is a special bool var
with a comment in the form //my-dimension:<in>
where <in>
is literally the
term. For example, if we had:
package main
import "fmt"
var inMyDimension bool //my-dimension:<in>
func PrintSomething() {
if inMyDimension {
fmt.Println("In my dimension")
} else {
fmt.Println("In normal code")
}
}
var PrintSomethingInMyDimension func() string //my-dimension:PrintSomething
func main() {
PrintSomething()
PrintSomethingInMyDimension()
}
Then running with the toolexec
, the output will be:
In normal code
In my dimension
These in-vars can be in any transformed package and any number of them may be created. They do not have to be exported.
Testing
An earlier incarnation of this library had an entire test framework, but it became very apparent it was much clearer to
just pass toolexec
to go test
too and run code and build bridge functions to test across dimensions there.
Therefore to test a transformer, just write tests with bridge functions as needed to assert the transformer did the
right thing, and run go test
with -toolexec
of the transformer. This means there is transformer build a step that
runs before go test
which can be automated as needed.
Advanced
Patching
TransformResult
contains a set of patches that reference positions on the file set of the incoming package. Each
Patch
contains a required Range
it replaces that contains a required inclusive start Pos
and an optional exclusive
End
. If End
is 0/unset, the patch will be an insertion instead of a replacement. The required Str
of the patch
contains the string contents to patch.
Some notes about patches:
- Patches cannot overlap, so care must be taken by the transformer
- Internally, Superpose patches the package name and any transformed imports, so the transformer must make sure not to
overlap with those patches
- If
Str
contains {{
, it is assumed to be a Go template
- The patch can contain
Captures
which is a named map of ranges that are made available via the Captures
object in
the template
Some guidance on patching:
- Patches should alter the existing code as little as possible to help preserve line counts
go fmt
is not applied on patched code
- For example, many lines of code can be put on a single line separated by semicolons, which Go supports
- A semicolon can be added after an existing statement to add another statement on the same line
- Using an existing AST position and adding or subtracting
1
will reference the character right after or before
respectively
- If it is known a patch may alter line count, use a
/*line :<line>*/
-style
line directive afterwards to put the compiler back on the
right line count for successive code
- In Go, it is acceptable to return early or panic early leaving dead code, so often there is no need to be concerned
with removing code in these situations
- It is often better to immediately delegate to some proper written package for a task than to have a complicated set of
patches (see next section)
- Although there is a
WrapWithPatch
, if patching things that are not full replacements, two patches should be used
make a "wrapping" patch (like one that calls another function) - one for the LHS and one for the RHS (if needed).
This is because an inner expression of what is being wrapped may also be transformed by the transformer and would
cause patch overlap. Granted if it is known that nothing internal could ever be recursively transformed, no need to
follow this suggestion.
When transforming, sometimes it is necessary to depend on a package that may not have been depended on by the
transformed package before. The transformer is expected to patch the import
s necessary in source to do this. However,
the linker needs to know about any new packages to include at compile time. This can be done by setting the dependency
package name as a key on the TransformResult.IncludeDependencyPackages
map. If the package is already a dependency of
this package, it will have no effect.
When Go compiles a package, it first collects and compiles its dependencies. Go expects all dependencies are compiled
before the current package is compiled. Therefore, any dependencies added to this map must have already been compiled.
And it must also be resolvable. go list -f "{{.Export}}" -export qualified/pkg/path
is used to obtain the package
file.
Users are encouraged to have their transformer code and their runtime code explicitly reference the package that may
be needed somewhere in code so that it is included as a go.mod
dependency at compile time and runtime. In cases
where the transformer is compiled somewhere differently than the code that uses it is compiled, this can still result in
cases where the dependency is not yet compiled. In these cases, it is encouraged to build the transformer where the code
is built, or if that can't be done, technically go build
can be done on the package as needed.
Caching
Go uses a concept of a "build ID" for caching output and determining whether to re-run. This is built on a set of
slash-delimited hashes: a leading hash representing input called an "action ID", a trailing hash representing output
called a "content ID" (which may be unset if not yet compiled), and any content in between. See comments at the top of
buildid.go in the Go source if curious
about details.
The build ID can be affected by content changes, Go version changes, build tag changes, different build flags, etc.
Superpose leverages this behavior by just altering the existing action IDs with reproducible dimension-specific hashes
for the other dimensions and caches the results in its own cache. Since this hash is built by dimension name and not
patched content, it can be stale if the transformer changes. So a required Version
must be set in the Superpose
config.
Version
should be unique for each change of a transformer that would alter code. Otherwise old cached builds from a
previous version of the same transformer may be used. Many developers may choose to use
superpose.MustLoadCurrentExeContentID()
which is the content ID of the current executable (so it changes when the exe
changes). This is a reasonable default choice but it has two downsides:
- It runs
go tool buildid <current exe>
on every single Go compile/link command. So now every package that has to be
compiled will run this fast separate process, but it's so fast it's usually negligible.
- Cache will be invalidated for the slightest change to the transformer, even if it doesn't result in code changes to
the transformed output.
If either of these are a concern, the Version
field can be manually maintained.
Additional flags
Executables for toolexec
built with Superpose already accept flags like -verbose
and -buildtags
. Users can add
their own options to be set by a user using superpose.Config.AdditionalFlags
. Don't forget to properly quote the flags
when compiling, e.g.:
go build -toolexec "/path/to/my-transformer -myflag flag value" some_code.go
Development and debugging
Effort has not currently been made to support step-based debuggers in toolexec. Therefore, the only approach to having
development/debugging details is to use logging.
During development, superpose.Config.Verbose
can be true
to show a lot of output during compilation. It can also be
set to true via the -verbose
flag on the toolexec
executable. Verbose will also include any logs to
ctx.Superpose.Debugf
on the context inside the transformer. Also, TransformResult.LogPatchedFiles
can be set to
true
on the transformer result to have full patched files dumped via that same logging mechanism (so still only
visible if Verbose
is set).
How it works in detail
High-level Go compilation primer
When go build
is run, here's (mostly) what happens:
compile -V=full
is called to get the tool build ID to affect build IDs of the compiler's inputs/outputs
compile
is run for each package, with dependencies run before dependents
- All files in the package are provided as arguments
- A build ID is provided which is just the action ID (i.e. a unique hash of the content to compile)
- If this package appears to have been built in the past for this action ID, compile is not called for it. Use
go env GOCACHE
to see where by default these are cached
- A temp output location is given for compilation results
- An
importcfg
is given which is a file containing a list of dependency packages already compiled that the
package being compiled needs
- Compilation is performed
link
is called to build the executable
- An
importcfg
is given which contains all built dependency packages for the entire program
- Link builds the executable
When -toolexec
is added to go
build calls, instead of the above steps executing directly, that tool is called for
each of the above steps where the compile/link/etc executables with their args just become the tool's args. Therefore
Superpose just intercepts -toolexec
calls.
On compile
When toolexec
is executed for the compile step, Superpose does two steps defined below - "compile dimensions" and
"build bridge". Then it continues the compilation, possibly using updated arguments from the last step.
Compile dimensions
If any transformers apply to the given package and if that package has not already been compiled in that dimension
before for its given action ID, we run the package through the transformers as described below.
- Load the package
- Call transform on the package to get patches
- For all imports in the package to other applicable-to-that-dimension packages, add patches to replace those import
paths with the mangled dimension path equivalents
- For all in-vars referencing the dimension, patch them to be set to
true
- If
AddLineDirectives: true
, for every file that has a patch on it, add a line directive at the top of the file
telling the compiler to treat it as the original file name
- Apply all patches as temporary files
- Copy the original compile args but replace all patched file paths with their patched file locations
- Update the package argument of compile args to dimension-mangled path
- Update the build ID argument of compile args for a derived hash for the dimension
- Update the
importcfg
argument to a temp importcfg
file containing updated dependencies that are applicable to
this dimension and containing dependencies that were explicitly asked to be included by the transformer
- Update the output argument of compile args to a temp file placeholder
- Run compile
- Copy the built package file to the Superpose build cache
- Add metadata in the Superpose build cache containing explicitly-requested dependencies to include
Build bridge
If there are any bridge function vars in the package:
- Build a temp init file that, for each bridge function var
- Imports the dimension referenced if not already done
- Adds an init statement that populates that var with a reference to the bridge function from the other dimension
- Update compile args by adding a temp init file to the end of the to-be-compiled file list
- Update
importcfg
compile arg with a new file that contains the contents of the existing file and adds new package
references for the dimension-specific packages that were imported
On link
Before the downstream link
call is performed, the following argument alterations are made:
- Create a new
importcfg
file that has the contents of the old one
- For every package in the
importcfg
file that applies to a dimension, add the dimension-specific package too
- Load dimension-package metadata and add all explicitly included dependencies to the
importcfg
file if not already
there
Caveats
TODO(cretz):
- reflect.Type.PkgPath() is the dimension package
- But even that can be patched if it must be
- Perf and mem size
- Types that can't cross the boundary
- "internal" packages
Why
At Temporal, workflows in Go are written using our SDK. Workflow code is required to be
deterministic and isolated. Currently, Temporal just asks that users to not use the non-deterministic constructs in Go
(i.e. async constructs, external stuff, map ranging, global state mutation). This is part of a research project to see
if we can make an insecure sandbox that does make those constructs deterministic so the code doesn't have to concern
itself with safety. So we can make map ranging deterministic, do goroutine-local globals, use deterministic emulations
of Go async constructs, and somewhat restrict external system access in an acceptably-not-foolproof way.
TODO
- CI
- Support more options for compile time alteration including:
- Wrapping the entire
go
command and injecting toolexec
on build
, e.g. my-go build ...
would become
go build -toolexec "/path/to/my-go toolexec"
go:generate
or manual code generation that writes entire patched set of source somewhere for easy compilation
- Support altering primary code instead of just other dimensions
- Was out of scope for initial needs
- Update example/maporder to support insertion-based ordering
- Add an example for "globals sandbox" which replaces all globals and global access with a wrapper and does a
goroutine-local approach to maintaining state
- Tests:
internal
package transformed
- Stack trace and debugging
- Support other build flags like
-modfile
and really anything