pattern

package
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2024 License: MIT Imports: 10 Imported by: 24

Documentation

Overview

Package pattern implements a simple language for pattern matching Go ASTs.

Design decisions and trade-offs

The language is designed specifically for the task of filtering ASTs to simplify the implementation of analyses in staticcheck. It is also intended to be trivial to parse and execute.

To that end, we make certain decisions that make the language more suited to its task, while making certain queries infeasible.

Furthermore, it is fully expected that the majority of analyses will still require ordinary Go code to further process the filtered AST, to make use of type information and to enforce complex invariants. It is not our goal to design a scripting language for writing entire checks in.

The language

At its core, patterns are a representation of Go ASTs, allowing for the use of placeholders to enable pattern matching. Their syntax is inspired by LISP and Haskell, but unlike LISP, the core unit of patterns isn't the list, but the node. There is a fixed set of nodes, identified by name, and with the exception of the Or node, all nodes have a fixed number of arguments. In addition to nodes, there are atoms, which represent basic units such as strings or the nil value.

Pattern matching is implemented via bindings, represented by the Binding node. A Binding can match nodes and associate them with names, to later recall the nodes. This allows for expressing "this node must be equal to that node" constraints.

To simplify writing and reading patterns, a small amount of additional syntax exists on top of nodes and atoms. This additional syntax doesn't add any new features of its own, it simply provides shortcuts to creating nodes and atoms.

To show an example of a pattern, first consider this snippet of Go code:

if x := fn(); x != nil {
	for _, v := range x {
		println(v, x)
	}
}

The corresponding AST expressed as an idiomatic pattern would look as follows:

(IfStmt
	(AssignStmt (Ident "x") ":=" (CallExpr (Ident "fn") []))
	(BinaryExpr (Ident "x") "!=" (Ident "nil"))
	(RangeStmt
		(Ident "_") (Ident "v") ":=" (Ident "x")
		(CallExpr (Ident "println") [(Ident "v") (Ident "x")]))
	nil)

Two things are worth noting about this representation. First, the [el1 el2 ...] syntax is a short-hand for creating lists. It is a short-hand for el1:el2:[], which itself is a short-hand for (List el1 (List el2 (List nil nil)). Second, note the absence of a lot of lists in places that normally accept lists. For example, assignment assigns a number of right-hands to a number of left-hands, yet our AssignStmt is lacking any form of list. This is due to the fact that a single node can match a list of exactly one element. Thus, the two following forms have identical matching behavior:

(AssignStmt (Ident "x") ":=" (CallExpr (Ident "fn") []))
(AssignStmt [(Ident "x")] ":=" [(CallExpr (Ident "fn") [])])

This section serves as an overview of the language's syntax. More in-depth explanations of the matching behavior as well as an exhaustive list of node types follows in the coming sections.

Pattern matching

TODO write about pattern matching

- inspired by haskell syntax, but much, much simpler and naive

Node types

The language contains two kinds of nodes: those that map to nodes in the AST, and those that implement additional logic.

Nodes that map directly to AST nodes are named identically to the types in the go/ast package. What follows is an exhaustive list of these nodes:

(ArrayType len elt)
(AssignStmt lhs tok rhs)
(BasicLit kind value)
(BinaryExpr x op y)
(BranchStmt tok label)
(CallExpr fun args)
(CaseClause list body)
(ChanType dir value)
(CommClause comm body)
(CompositeLit type elts)
(DeferStmt call)
(Ellipsis elt)
(EmptyStmt)
(Field names type tag)
(ForStmt init cond post body)
(FuncDecl recv name type body)
(FuncLit type body)
(FuncType params results)
(GenDecl specs)
(GoStmt call)
(Ident name)
(IfStmt init cond body else)
(ImportSpec name path)
(IncDecStmt x tok)
(IndexExpr x index)
(InterfaceType methods)
(KeyValueExpr key value)
(MapType key value)
(RangeStmt key value tok x body)
(ReturnStmt results)
(SelectStmt body)
(SelectorExpr x sel)
(SendStmt chan value)
(SliceExpr x low high max)
(StarExpr x)
(StructType fields)
(SwitchStmt init tag body)
(TypeAssertExpr)
(TypeSpec name type)
(TypeSwitchStmt init assign body)
(UnaryExpr op x)
(ValueSpec names type values)

Additionally, there are the String, Token and nil atoms. Strings are double-quoted string literals, as in (Ident "someName"). Tokens are also represented as double-quoted string literals, but are converted to token.Token values in contexts that require tokens, such as in (BinaryExpr x "<" y), where "<" is transparently converted to token.LSS during matching. The keyword 'nil' denotes the nil value, which represents the absence of any value.

We also define the (List head tail) node, which is used to represent sequences of elements as a singly linked list. The head is a single element, and the tail is the remainder of the list. For example,

(List "foo" (List "bar" (List "baz" (List nil nil))))

represents a list of three elements, "foo", "bar" and "baz". There is dedicated syntax for writing lists, which looks as follows:

["foo" "bar" "baz"]

This syntax is itself syntactic sugar for the following form:

"foo":"bar":"baz":[]

This form is of particular interest for pattern matching, as it allows matching on the head and tail. For example,

"foo":"bar":_

would match any list with at least two elements, where the first two elements are "foo" and "bar". This is equivalent to writing

(List "foo" (List "bar" _))

Note that it is not possible to match from the end of the list. That is, there is no way to express a query such as "a list of any length where the last element is foo".

Note that unlike in LISP, nil and empty lists are distinct from one another. In patterns, with respect to lists, nil is akin to Go's untyped nil. It will match a nil ast.Node, but it will not match a nil []ast.Expr. Nil will, however, match pointers to named types such as *ast.Ident. Similarly, lists are akin to Go's slices. An empty list will match both a nil and an empty []ast.Expr, but it will not match a nil ast.Node.

Due to the difference between nil and empty lists, an empty list is represented as (List nil nil), i.e. a list with no head or tail. Similarly, a list of one element is represented as (List el (List nil nil)). Unlike in LISP, it cannot be represented by (List el nil).

Finally, there are nodes that implement special logic or matching behavior.

(Any) matches any value. The underscore (_) maps to this node, making the following two forms equivalent:

(Ident _)
(Ident (Any))

(Builtin name) matches a built-in identifier or function by name. This is a type-aware variant of (Ident name). Instead of only comparing the name, it resolves the object behind the name and makes sure it's a pre-declared identifier.

For example, in the following piece of code

func fn() {
	println(true)
	true := false
	println(true)
}

the pattern

(Builtin "true")

will match exactly once, on the first use of 'true' in the function. Subsequent occurrences of 'true' no longer refer to the pre-declared identifier.

(Object name) matches an identifier by name, but yields the types.Object it refers to.

(Symbol name) matches ast.Idents and ast.SelectorExprs that refer to a symbol with a given fully qualified name. For example, "net/url.PathEscape" matches the PathEscape function in the net/url package, and "(net/url.EscapeError).Error" refers to the Error method on the net/url.EscapeError type, either on an instance of the type, or on the type itself.

For example, the following patterns match the following lines of code:

(CallExpr (Symbol "fmt.Println") _) // pattern 1
(CallExpr (Symbol "(net/url.EscapeError).Error") _) // pattern 2

fmt.Println("hello, world") // matches pattern 1
var x url.EscapeError
x.Error() // matches pattern 2
(url.EscapeError).Error(x) // also matches pattern 2

(Binding name node) creates or uses a binding. Bindings work like variable assignments, allowing referring to already matched nodes. As an example, bindings are necessary to match self-assignment of the form "x = x", since we need to express that the right-hand side is identical to the left-hand side.

If a binding's node is not nil, the matcher will attempt to match a node according to the pattern. If a binding's node is nil, the binding will either recall an existing value, or match the Any node. It is an error to provide a non-nil node to a binding that has already been bound.

Referring back to the earlier example, the following pattern will match self-assignment of idents:

(AssignStmt (Binding "lhs" (Ident _)) "=" (Binding "lhs" nil))

Because bindings are a crucial component of pattern matching, there is special syntax for creating and recalling bindings. Lower-case names refer to bindings. If standing on its own, the name "foo" will be equivalent to (Binding "foo" nil). If a name is followed by an at-sign (@) then it will create a binding for the node that follows. Together, this allows us to rewrite the earlier example as follows:

(AssignStmt lhs@(Ident _) "=" lhs)

(Or nodes...) is a variadic node that tries matching each node until one succeeds. For example, the following pattern matches all idents of name "foo" or "bar":

(Ident (Or "foo" "bar"))

We could also have written

(Or (Ident "foo") (Ident "bar"))

and achieved the same result. We can also mix different kinds of nodes:

(Or (Ident "foo") (CallExpr (Ident "bar") _))

When using bindings inside of nodes used inside Or, all or none of the bindings will be bound. That is, partially matched nodes that ultimately failed to match will not produce any bindings observable outside of the matching attempt. We can thus write

(Or (Ident name) (CallExpr name))

and 'name' will either be a String if the first option matched, or an Ident or SelectorExpr if the second option matched.

(Not node)

The Not node negates a match. For example, (Not (Ident _)) will match all nodes that aren't identifiers.

ChanDir(0)

Automatic unnesting of AST nodes

The Go AST has several types of nodes that wrap other nodes. To simplify matching, we automatically unwrap some of these nodes.

These nodes are ExprStmt (for using expressions in a statement context), ParenExpr (for parenthesized expressions), DeclStmt (for declarations in a statement context), and LabeledStmt (for labeled statements).

Thus, the query

(FuncLit _ [(CallExpr _ _)]

will match a function literal containing a single function call, even though in the actual Go AST, the CallExpr is nested inside an ExprStmt, as function bodies are made up of sequences of statements.

On the flip-side, there is no way to specifically match these wrapper nodes. For example, there is no way of searching for unnecessary parentheses, like in the following piece of Go code:

((x)) += 2

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NodeToAST

func NodeToAST(node Node, state State) interface{}

Types

type Any

type Any struct{}

func (Any) Match

func (Any) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Any) String

func (Any) String() string

type ArrayType

type ArrayType struct {
	Len Node
	Elt Node
}

func (ArrayType) String

func (typ ArrayType) String() string

type AssignStmt

type AssignStmt struct {
	Lhs Node
	Tok Node
	Rhs Node
}

func (AssignStmt) String

func (stmt AssignStmt) String() string

type BasicLit

type BasicLit struct {
	Kind  Node
	Value Node
}

func (BasicLit) String

func (lit BasicLit) String() string

type BinaryExpr

type BinaryExpr struct {
	X  Node
	Op Node
	Y  Node
}

func (BinaryExpr) String

func (expr BinaryExpr) String() string

type Binding

type Binding struct {
	Name string
	Node Node
	// contains filtered or unexported fields
}

func (Binding) Match

func (b Binding) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Binding) String

func (bind Binding) String() string

type BranchStmt

type BranchStmt struct {
	Tok   Node
	Label Node
}

func (BranchStmt) String

func (stmt BranchStmt) String() string

type Builtin

type Builtin struct {
	Name Node
}

func (Builtin) Match

func (builtin Builtin) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Builtin) String

func (builtin Builtin) String() string

type CallExpr

type CallExpr struct {
	Fun  Node
	Args Node
}

func (CallExpr) String

func (expr CallExpr) String() string

type CaseClause

type CaseClause struct {
	List Node
	Body Node
}

func (CaseClause) String

func (clause CaseClause) String() string

type ChanType

type ChanType struct {
	Dir   Node
	Value Node
}

func (ChanType) String

func (typ ChanType) String() string

type CommClause

type CommClause struct {
	Comm Node
	Body Node
}

func (CommClause) String

func (clause CommClause) String() string

type CompositeLit

type CompositeLit struct {
	Type Node
	Elts Node
}

func (CompositeLit) String

func (lit CompositeLit) String() string

type DeferStmt

type DeferStmt struct {
	Call Node
}

func (DeferStmt) String

func (stmt DeferStmt) String() string

type Ellipsis

type Ellipsis struct {
	Elt Node
}

func (Ellipsis) String

func (el Ellipsis) String() string

type EmptyStmt

type EmptyStmt struct {
}

func (EmptyStmt) String

func (stmt EmptyStmt) String() string

type Field

type Field struct {
	Names Node
	Type  Node
	Tag   Node
}

func (Field) String

func (field Field) String() string

type ForStmt

type ForStmt struct {
	Init Node
	Cond Node
	Post Node
	Body Node
}

func (ForStmt) String

func (stmt ForStmt) String() string

type FuncDecl

type FuncDecl struct {
	Recv Node
	Name Node
	Type Node
	Body Node
}

func (FuncDecl) String

func (decl FuncDecl) String() string

type FuncLit

type FuncLit struct {
	Type Node
	Body Node
}

func (FuncLit) String

func (lit FuncLit) String() string

type FuncType

type FuncType struct {
	Params  Node
	Results Node
}

func (FuncType) String

func (typ FuncType) String() string

type GenDecl

type GenDecl struct {
	Tok   Node
	Specs Node
}

func (GenDecl) String

func (decl GenDecl) String() string

type GoStmt

type GoStmt struct {
	Call Node
}

func (GoStmt) String

func (stmt GoStmt) String() string

type Ident

type Ident struct {
	Name Node
}

func (Ident) String

func (id Ident) String() string

type IfStmt

type IfStmt struct {
	Init Node
	Cond Node
	Body Node
	Else Node
}

func (IfStmt) String

func (stmt IfStmt) String() string

type ImportSpec

type ImportSpec struct {
	Name Node
	Path Node
}

func (ImportSpec) String

func (spec ImportSpec) String() string

type IncDecStmt

type IncDecStmt struct {
	X   Node
	Tok Node
}

func (IncDecStmt) String

func (stmt IncDecStmt) String() string

type IndexExpr

type IndexExpr struct {
	X     Node
	Index Node
}

func (IndexExpr) String

func (expr IndexExpr) String() string

type IndexListExpr added in v0.3.0

type IndexListExpr struct {
	X       Node
	Indices Node
}

func (IndexListExpr) String added in v0.3.0

func (expr IndexListExpr) String() string

type IntegerLiteral added in v0.3.0

type IntegerLiteral struct {
	Value Node
}

An IntegerLiteral is a constant expression made up of only integer basic literals and the "+" and "-" unary operators. That is, 0, -4, -+42 are all integer literals, but 1 + 2 is not.

func (IntegerLiteral) Match added in v0.3.0

func (lit IntegerLiteral) Match(m *Matcher, node interface{}) (interface{}, bool)

func (IntegerLiteral) String added in v0.3.0

func (lit IntegerLiteral) String() string

type InterfaceType

type InterfaceType struct {
	Methods Node
}

func (InterfaceType) String

func (typ InterfaceType) String() string

type KeyValueExpr

type KeyValueExpr struct {
	Key   Node
	Value Node
}

func (KeyValueExpr) String

func (expr KeyValueExpr) String() string

type List

type List struct {
	Head Node
	Tail Node
}

func (List) Match

func (l List) Match(m *Matcher, node interface{}) (interface{}, bool)

func (List) String

func (l List) String() string

type MapType

type MapType struct {
	Key   Node
	Value Node
}

func (MapType) String

func (typ MapType) String() string

type Matcher

type Matcher struct {
	TypesInfo *types.Info
	State     State
	// contains filtered or unexported fields
}

func Match

func Match(a Pattern, b ast.Node) (*Matcher, bool)

func (*Matcher) Match

func (m *Matcher) Match(a Pattern, b ast.Node) bool

type Nil

type Nil struct {
}

func (Nil) Match

func (Nil) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Nil) String

func (nil Nil) String() string

type Node

type Node interface {
	String() string
	// contains filtered or unexported methods
}

func ASTToNode

func ASTToNode(node interface{}) Node

type Not

type Not struct {
	Node Node
}

func (Not) Match

func (not Not) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Not) String

func (not Not) String() string

type Object

type Object struct {
	Name Node
}

func (Object) Match

func (obj Object) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Object) String

func (obj Object) String() string

type Or

type Or struct {
	Nodes []Node
}

func (Or) Match

func (or Or) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Or) String

func (or Or) String() string

type Parser

type Parser struct {
	// Allow nodes that rely on type information
	AllowTypeInfo bool
	// contains filtered or unexported fields
}

func (*Parser) Parse

func (p *Parser) Parse(s string) (Pattern, error)

type Pattern

type Pattern struct {
	Root Node
	// Relevant contains instances of ast.Node that could potentially
	// initiate a successful match of the pattern.
	Relevant map[reflect.Type]struct{}

	// Mapping from binding index to binding name
	Bindings []string
}

func MustParse

func MustParse(s string) Pattern

type RangeStmt

type RangeStmt struct {
	Key   Node
	Value Node
	Tok   Node
	X     Node
	Body  Node
}

func (RangeStmt) String

func (stmt RangeStmt) String() string

type ReturnStmt

type ReturnStmt struct {
	Results Node
}

func (ReturnStmt) String

func (stmt ReturnStmt) String() string

type SelectStmt

type SelectStmt struct {
	Body Node
}

func (SelectStmt) String

func (stmt SelectStmt) String() string

type SelectorExpr

type SelectorExpr struct {
	X   Node
	Sel Node
}

func (SelectorExpr) String

func (expr SelectorExpr) String() string

type SendStmt

type SendStmt struct {
	Chan  Node
	Value Node
}

func (SendStmt) String

func (stmt SendStmt) String() string

type SliceExpr

type SliceExpr struct {
	X    Node
	Low  Node
	High Node
	Max  Node
}

func (SliceExpr) String

func (expr SliceExpr) String() string

type StarExpr

type StarExpr struct {
	X Node
}

func (StarExpr) String

func (expr StarExpr) String() string

type State

type State = map[string]any

type String

type String string

func (String) Match

func (s String) Match(m *Matcher, node interface{}) (interface{}, bool)

func (String) String

func (s String) String() string

type StructType

type StructType struct {
	Fields Node
}

func (StructType) String

func (typ StructType) String() string

type SwitchStmt

type SwitchStmt struct {
	Init Node
	Tag  Node
	Body Node
}

func (SwitchStmt) String

func (stmt SwitchStmt) String() string

type Symbol added in v0.3.2

type Symbol struct {
	Name Node
}

func (Symbol) Match added in v0.3.2

func (fn Symbol) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Symbol) String added in v0.3.2

func (fn Symbol) String() string

type Token

type Token token.Token

func (Token) Match

func (tok Token) Match(m *Matcher, node interface{}) (interface{}, bool)

func (Token) String

func (tok Token) String() string

type TrulyConstantExpression added in v0.3.0

type TrulyConstantExpression struct {
	Value Node
}

A TrulyConstantExpression is a constant expression that does not make use of any identifiers. It is constant even under varying build tags.

func (TrulyConstantExpression) Match added in v0.3.0

func (texpr TrulyConstantExpression) Match(m *Matcher, node interface{}) (interface{}, bool)

func (TrulyConstantExpression) String added in v0.3.0

func (expr TrulyConstantExpression) String() string

type TypeAssertExpr

type TypeAssertExpr struct {
	X    Node
	Type Node
}

func (TypeAssertExpr) String

func (expr TypeAssertExpr) String() string

type TypeSpec

type TypeSpec struct {
	Name Node
	Type Node
}

func (TypeSpec) String

func (spec TypeSpec) String() string

type TypeSwitchStmt

type TypeSwitchStmt struct {
	Init   Node
	Assign Node
	Body   Node
}

func (TypeSwitchStmt) String

func (stmt TypeSwitchStmt) String() string

type UnaryExpr

type UnaryExpr struct {
	Op Node
	X  Node
}

func (UnaryExpr) String

func (expr UnaryExpr) String() string

type ValueSpec

type ValueSpec struct {
	Names  Node
	Type   Node
	Values Node
}

func (ValueSpec) String

func (spec ValueSpec) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL