compile

package
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 6, 2022 License: BSD-3-Clause Imports: 18 Imported by: 0

README

Introduction to the Go compiler

cmd/compile contains the main packages that form the Go compiler. The compiler may be logically split in four phases, which we will briefly describe alongside the list of packages that contain their code.

You may sometimes hear the terms "front-end" and "back-end" when referring to the compiler. Roughly speaking, these translate to the first two and last two phases we are going to list here. A third term, "middle-end", often refers to much of the work that happens in the second phase.

Note that the go/* family of packages, such as go/parser and go/types, are mostly unused by the compiler. Since the compiler was initially written in C, the go/* packages were developed to enable writing tools working with Go code, such as gofmt and vet. However, over time the compiler's internal APIs have slowly evolved to be more familiar to users of the go/* packages.

It should be clarified that the name "gc" stands for "Go compiler", and has little to do with uppercase "GC", which stands for garbage collection.

1. Parsing

  • cmd/compile/internal/syntax (lexer, parser, syntax tree)

In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.

Each syntax tree is an exact representation of the respective source file, with nodes corresponding to the various elements of the source such as expressions, declarations, and statements. The syntax tree also includes position information which is used for error reporting and the creation of debugging information.

2. Type checking

  • cmd/compile/internal/types2 (type checking)

The types2 package is a port of go/types to use the syntax package's AST instead of go/ast.

3. IR construction ("noding")

  • cmd/compile/internal/types (compiler types)
  • cmd/compile/internal/ir (compiler AST)
  • cmd/compile/internal/typecheck (AST transformations)
  • cmd/compile/internal/noder (create compiler AST)

The compiler middle end uses its own AST definition and representation of Go types carried over from when it was written in C. All of its code is written in terms of these, so the next step after type checking is to convert the syntax and types2 representations to ir and types. This process is referred to as "noding."

There are currently two noding implementations:

  1. irgen (aka "-G=3" or sometimes "noder2") is the implementation used starting with Go 1.18, and

  2. Unified IR is another, in-development implementation (enabled with GOEXPERIMENT=unified), which also implements import/export and inlining.

Up through Go 1.18, there was a third noding implementation (just "noder" or "-G=0"), which directly converted the pre-type-checked syntax representation into IR and then invoked package typecheck's type checker. This implementation was removed after Go 1.18, so now package typecheck is only used for IR transformations.

4. Middle end

  • cmd/compile/internal/deadcode (dead code elimination)
  • cmd/compile/internal/inline (function call inlining)
  • cmd/compile/internal/devirtualize (devirtualization of known interface method calls)
  • cmd/compile/internal/escape (escape analysis)

Several optimization passes are performed on the IR representation: dead code elimination, (early) devirtualization, function call inlining, and escape analysis.

5. Walk

  • cmd/compile/internal/walk (order of evaluation, desugaring)

The final pass over the IR representation is "walk," which serves two purposes:

  1. It decomposes complex statements into individual, simpler statements, introducing temporary variables and respecting order of evaluation. This step is also referred to as "order."

  2. It desugars higher-level Go constructs into more primitive ones. For example, switch statements are turned into binary search or jump tables, and operations on maps and channels are replaced with runtime calls.

6. Generic SSA

  • cmd/compile/internal/ssa (SSA passes and rules)
  • cmd/compile/internal/ssagen (converting IR to SSA)

In this phase, IR is converted into Static Single Assignment (SSA) form, a lower-level intermediate representation with specific properties that make it easier to implement optimizations and to eventually generate machine code from it.

During this conversion, function intrinsics are applied. These are special functions that the compiler has been taught to replace with heavily optimized code on a case-by-case basis.

Certain nodes are also lowered into simpler components during the AST to SSA conversion, so that the rest of the compiler can work with them. For instance, the copy builtin is replaced by memory moves, and range loops are rewritten into for loops. Some of these currently happen before the conversion to SSA due to historical reasons, but the long-term plan is to move all of them here.

Then, a series of machine-independent passes and rules are applied. These do not concern any single computer architecture, and thus run on all GOARCH variants. These passes include dead code elimination, removal of unneeded nil checks, and removal of unused branches. The generic rewrite rules mainly concern expressions, such as replacing some expressions with constant values, and optimizing multiplications and float operations.

7. Generating machine code

  • cmd/compile/internal/ssa (SSA lowering and arch-specific passes)
  • cmd/internal/obj (machine code generation)

The machine-dependent phase of the compiler begins with the "lower" pass, which rewrites generic values into their machine-specific variants. For example, on amd64 memory operands are possible, so many load-store operations may be combined.

Note that the lower pass runs all machine-specific rewrite rules, and thus it currently applies lots of optimizations too.

Once the SSA has been "lowered" and is more specific to the target architecture, the final code optimization passes are run. This includes yet another dead code elimination pass, moving values closer to their uses, the removal of local variables that are never read from, and register allocation.

Other important pieces of work done as part of this step include stack frame layout, which assigns stack offsets to local variables, and pointer liveness analysis, which computes which on-stack pointers are live at each GC safe point.

At the end of the SSA generation phase, Go functions have been transformed into a series of obj.Prog instructions. These are passed to the assembler (cmd/internal/obj), which turns them into machine code and writes out the final object file. The object file will also contain reflect data, export data, and debugging information.

Further reading

To dig deeper into how the SSA package works, including its passes and rules, head to cmd/compile/internal/ssa/README.md.

Documentation

Overview

Compile, typically invoked as “go tool compile,” compiles a single Go package comprising the files named on the command line. It then writes a single object file named for the basename of the first source file with a .o suffix. The object file can then be combined with other objects into a package archive or passed directly to the linker (“go tool link”). If invoked with -pack, the compiler writes an archive directly, bypassing the intermediate object file.

The generated files contain type information about the symbols exported by the package and about types used by symbols imported by the package from other packages. It is therefore not necessary when compiling client C of package P to read the files of P's dependencies, only the compiled output of P.

Command Line

Usage:

go tool compile [flags] file...

The specified files must be Go source files and all part of the same package. The same compiler is used for all target operating systems and architectures. The GOOS and GOARCH environment variables set the desired target.

Flags:

-D path
	Set relative path for local imports.
-I dir1 -I dir2
	Search for imported packages in dir1, dir2, etc,
	after consulting $GOROOT/pkg/$GOOS_$GOARCH.
-L
	Show complete file path in error messages.
-N
	Disable optimizations.
-S
	Print assembly listing to standard output (code only).
-S -S
	Print assembly listing to standard output (code and data).
-V
	Print compiler version and exit.
-asmhdr file
	Write assembly header to file.
-asan
	Insert calls to C/C++ address sanitizer.
-buildid id
	Record id as the build id in the export metadata.
-blockprofile file
	Write block profile for the compilation to file.
-c int
	Concurrency during compilation. Set 1 for no concurrency (default is 1).
-complete
	Assume package has no non-Go components.
-cpuprofile file
	Write a CPU profile for the compilation to file.
-dynlink
	Allow references to Go symbols in shared libraries (experimental).
-e
	Remove the limit on the number of errors reported (default limit is 10).
-goversion string
	Specify required go tool version of the runtime.
	Exits when the runtime go version does not match goversion.
-h
	Halt with a stack trace at the first error detected.
-importcfg file
	Read import configuration from file.
	In the file, set importmap, packagefile to specify import resolution.
-installsuffix suffix
	Look for packages in $GOROOT/pkg/$GOOS_$GOARCH_suffix
	instead of $GOROOT/pkg/$GOOS_$GOARCH.
-l
	Disable inlining.
-lang version
	Set language version to compile, as in -lang=go1.12.
	Default is current version.
-linkobj file
	Write linker-specific object to file and compiler-specific
	object to usual output file (as specified by -o).
	Without this flag, the -o output is a combination of both
	linker and compiler input.
-m
	Print optimization decisions. Higher values or repetition
	produce more detail.
-memprofile file
	Write memory profile for the compilation to file.
-memprofilerate rate
	Set runtime.MemProfileRate for the compilation to rate.
-msan
	Insert calls to C/C++ memory sanitizer.
-mutexprofile file
	Write mutex profile for the compilation to file.
-nolocalimports
	Disallow local (relative) imports.
-o file
	Write object to file (default file.o or, with -pack, file.a).
-p path
	Set expected package import path for the code being compiled,
	and diagnose imports that would cause a circular dependency.
-pack
	Write a package (archive) file rather than an object file
-race
	Compile with race detector enabled.
-s
	Warn about composite literals that can be simplified.
-shared
	Generate code that can be linked into a shared library.
-spectre list
	Enable spectre mitigations in list (all, index, ret).
-traceprofile file
	Write an execution trace to file.
-trimpath prefix
	Remove prefix from recorded source file paths.

Flags related to debugging information:

-dwarf
	Generate DWARF symbols.
-dwarflocationlists
	Add location lists to DWARF in optimized mode.
-gendwarfinl int
	Generate DWARF inline info records (default 2).

Flags to debug the compiler itself:

-E
	Debug symbol export.
-K
	Debug missing line numbers.
-d list
	Print debug information about items in list. Try -d help for further information.
-live
	Debug liveness analysis.
-v
	Increase debug verbosity.
-%
	Debug non-static initializers.
-W
	Debug parse tree after type checking.
-f
	Debug stack frames.
-i
	Debug line number stack.
-j
	Debug runtime-initialized variables.
-r
	Debug generated wrappers.
-w
	Debug type checking.

Compiler Directives

The compiler accepts directives in the form of comments. To distinguish them from non-directive comments, directives require no space between the comment opening and the name of the directive. However, since they are comments, tools unaware of the directive convention or of a particular directive can skip over a directive like any other comment.

Line directives come in several forms:

//line :line
//line :line:col
//line filename:line
//line filename:line:col
/*line :line*/
/*line :line:col*/
/*line filename:line*/
/*line filename:line:col*/

In order to be recognized as a line directive, the comment must start with //line or /*line followed by a space, and must contain at least one colon. The //line form must start at the beginning of a line. A line directive specifies the source position for the character immediately following the comment as having come from the specified file, line and column: For a //line comment, this is the first character of the next line, and for a /*line comment this is the character position immediately following the closing */. If no filename is given, the recorded filename is empty if there is also no column number; otherwise it is the most recently recorded filename (actual filename or filename specified by previous line directive). If a line directive doesn't specify a column number, the column is "unknown" until the next directive and the compiler does not report column numbers for that range. The line directive text is interpreted from the back: First the trailing :ddd is peeled off from the directive text if ddd is a valid number > 0. Then the second :ddd is peeled off the same way if it is valid. Anything before that is considered the filename (possibly including blanks and colons). Invalid line or column values are reported as errors.

Examples:

//line foo.go:10      the filename is foo.go, and the line number is 10 for the next line
//line C:foo.go:10    colons are permitted in filenames, here the filename is C:foo.go, and the line is 10
//line  a:100 :10     blanks are permitted in filenames, here the filename is " a:100 " (excluding quotes)
/*line :10:20*/x      the position of x is in the current file with line number 10 and column number 20
/*line foo: 10 */     this comment is recognized as invalid line directive (extra blanks around line number)

Line directives typically appear in machine-generated code, so that compilers and debuggers will report positions in the original input to the generator.

The line directive is a historical special case; all other directives are of the form //go:name, indicating that they are defined by the Go toolchain. Each directive must be placed its own line, with only leading spaces and tabs allowed before the comment. Each directive applies to the Go code that immediately follows it, which typically must be a declaration.

//go:noescape

The //go:noescape directive must be followed by a function declaration without a body (meaning that the function has an implementation not written in Go). It specifies that the function does not allow any of the pointers passed as arguments to escape into the heap or into the values returned from the function. This information can be used during the compiler's escape analysis of Go code calling the function.

//go:uintptrescapes

The //go:uintptrescapes directive must be followed by a function declaration. It specifies that the function's uintptr arguments may be pointer values that have been converted to uintptr and must be on the heap and kept alive for the duration of the call, even though from the types alone it would appear that the object is no longer needed during the call. The conversion from pointer to uintptr must appear in the argument list of any call to this function. This directive is necessary for some low-level system call implementations and should be avoided otherwise.

//go:noinline

The //go:noinline directive must be followed by a function declaration. It specifies that calls to the function should not be inlined, overriding the compiler's usual optimization rules. This is typically only needed for special runtime functions or when debugging the compiler.

//go:norace

The //go:norace directive must be followed by a function declaration. It specifies that the function's memory accesses must be ignored by the race detector. This is most commonly used in low-level code invoked at times when it is unsafe to call into the race detector runtime.

//go:nosplit

The //go:nosplit directive must be followed by a function declaration. It specifies that the function must omit its usual stack overflow check. This is most commonly used by low-level runtime code invoked at times when it is unsafe for the calling goroutine to be preempted.

//go:linkname localname [importpath.name]

This special directive does not apply to the Go code that follows it. Instead, the //go:linkname directive instructs the compiler to use “importpath.name” as the object file symbol name for the variable or function declared as “localname” in the source code. If the “importpath.name” argument is omitted, the directive uses the symbol's default object file symbol name and only has the effect of making the symbol accessible to other packages. Because this directive can subvert the type system and package modularity, it is only enabled in files that have imported "unsafe".

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Main

func Main()

Types

This section is empty.

Directories

Path Synopsis
Package flag implements command-line flag parsing.
Package flag implements command-line flag parsing.
internal
abi
abt
arm
compare
Package compare contains code for generating comparison routines for structs, strings and interfaces.
Package compare contains code for generating comparison routines for structs, strings and interfaces.
devirtualize
Package devirtualize implements a simple "devirtualization" optimization pass, which replaces interface method calls with direct concrete-type method calls where possible.
Package devirtualize implements a simple "devirtualization" optimization pass, which replaces interface method calls with direct concrete-type method calls where possible.
gc
importer
package importer implements Import for gc-generated object files.
package importer implements Import for gc-generated object files.
ir
ssa
types2
Package types declares the data types and implements the algorithms for type-checking of Go packages.
Package types declares the data types and implements the algorithms for type-checking of Go packages.
x86

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL