grammar

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2023 License: MIT Imports: 9 Imported by: 2

Documentation

Overview

Package grammar implements context-free grammars and associated constructs. A context-free grammar describes a language using a series of production rules. They are often represented using BNF, which looks something like this:

SUM  ::= EXPR '+' EXPR
EXPR ::= int | identifier

This package provides CFG for representing and manipulating such a grammar. In ictiobus they are used to automatically generate parsers by querying a CFG for information on its rules.

Index

Constants

This section is empty.

Variables

View Source
var (
	// Epsilon represents the empty production.
	Epsilon = Production{""}

	// Error is a production used to indicate an error.
	Error = Production{}
)

Functions

This section is empty.

Types

type CFG added in v1.0.0

type CFG struct {

	// Start is the name of the start symbol. If not set, It is assumed to be S.
	Start string
	// contains filtered or unexported fields
}

CFG is a context-free grammar for a language. It holds the rules and derivations that make up the grammar, and maintains the terminal and non-terminal symbols that are in it in the order they are defined.

The zero-value is a CFG ready to be used.

func MustParse

func MustParse(gr string) CFG

MustParse is identical to Parse but panics if an error is encountered.

func Parse

func Parse(gr string) (CFG, error)

Parse parses a 'grammar string' into a Grammar object. The string must have a semicolon between rules, spaces between each symbol, non-terminals must contain at least one upper-case letter. Epsilon "ε" is used for the epsilon production.

Example:

S -> A | B ;
A -> a | ε ;
B -> A b | c ;

func (*CFG) AddRule added in v1.0.0

func (g *CFG) AddRule(nonterminal string, production []string)

AddRule adds the given production for a nonterminal. If the nonterminal has already been given, the production is added as an alternative for that nonterminal with lower priority than all others already added.

All rules require at least one symbol in the production. For episilon production, give only the empty string.

func (*CFG) AddTerm added in v1.0.0

func (g *CFG) AddTerm(terminal string, class lex.TokenClass)

AddTerm adds the given terminal along with the TokenClass that corresponds to it; tokens must be of that class in order to match the terminal.

The mapping of terminal symbol IDs to TokenClasses must be 1-to-1; i.e. It is an error to map multiple terms to the same TokenClass, and it is an error to map the same term to multiple TokenClasses.

As a result, redefining the same term will cause the old one to be removed, and during validation if multiple terminals are matched to the same tokenClass it will be considered an error.

It is an error to map any terminal to lex.TokenUndefined or lex.TokenEndOfText and attempting to do so will cause a panic.

func (CFG) Augmented added in v1.0.0

func (g CFG) Augmented() CFG

Augmented returns a new grammar that is a copy of this one but with the start symbol S changed to a new rule, S' -> S. All other rules are kept the same.

func (CFG) Copy added in v1.0.0

func (g CFG) Copy() CFG

Copy makes a deep copy of the grammar.

func (CFG) GenerateUniqueName added in v1.0.0

func (g CFG) GenerateUniqueName(original string) string

GenerateUniqueName generates a name for a non-terminal gauranteed to be unique within the grammar, based on original if one is provided.

func (CFG) GenerateUniqueTerminal added in v1.0.0

func (g CFG) GenerateUniqueTerminal(original string) string

GenerateUniqueTerminal generates a name for a terminal gauranteed to be unique within the grammar, based on the given original if one is provided.

func (CFG) HasUnreachableNonTerminals added in v1.0.0

func (g CFG) HasUnreachableNonTerminals() bool

HasUnreachables returns whether the grammar currently has unreachle non-terminals.

func (CFG) IsNonTerminal added in v1.0.0

func (g CFG) IsNonTerminal(sym string) bool

IsNonTerminal returns whether the given symbol is a non-terminal used in the CFG.

func (CFG) IsTerminal added in v1.0.0

func (g CFG) IsTerminal(sym string) bool

IsTerminal returns whether the given symbol is a terminal used in the CFG.

func (CFG) LR0Items added in v1.0.0

func (g CFG) LR0Items() []LR0Item

LR0Items returns all LR0 items of rules in the grammar. These items can be used by LR parser generator algorithms to create a parsing DFA for the grammar.

func (CFG) LeftFactor added in v1.0.0

func (g CFG) LeftFactor() CFG

LeftFactor returns a new Grammar equivalent to this one but with all unclear alternative choices for a top-down parser are left factored to equivalent pairs of statements.

This is an implementation of Algorithm 4.21 from the purple dragon book, "Left factoring a grammar".

func (CFG) MarshalBinary added in v1.0.0

func (g CFG) MarshalBinary() ([]byte, error)

MarshalBinary converts g into a slice of bytes that can be decoded with UnmarshalBinary.

func (CFG) NonTerminals added in v1.0.0

func (g CFG) NonTerminals() []string

NonTerminals returns a list of all the non-terminal symbols in the grammar. Elements in the returned slice will be in a stable order, but necessarily the same order as they were added in. All will be upper-case.

func (CFG) NonTerminalsByPriority added in v1.0.0

func (g CFG) NonTerminalsByPriority() []string

NonTerminalsByPriority returns list of all the non-terminal symbols in the order they were defined in. All will be upper-case.

func (CFG) NonTerminalsByReversePriority added in v1.0.0

func (g CFG) NonTerminalsByReversePriority() []string

NonTerminalsByReversePriority returns list of all the non-terminal symbols in reverse order from the order they were defined in. This is handy because it can have the effect of causing iteration to do so in a manner that a human might do looking at a grammar, reversed.

func (CFG) ReachableFrom added in v1.0.0

func (g CFG) ReachableFrom(start string, end string) (bool, []Rule)

ReachableFrom returns whether the symbol end can be derived from symbol start with any combination of derivations. Returns whether it can be reached, and if it can, additional returns a set of Rules that each contain exactly one production; each is a derivation that must be done to a symbol of the prior derived to string to ultimately end up at end.

func (CFG) RemoveEpsilons added in v1.0.0

func (g CFG) RemoveEpsilons() CFG

RemoveEpsilons returns a grammar that derives strings equivalent to the first one (with the exception of the empty string) but with all epsilon productions automatically eliminated.

Call Validate before this or it may go poorly.

func (CFG) RemoveLeftRecursion added in v1.0.0

func (g CFG) RemoveLeftRecursion() CFG

RemoveLeftRecursion returns a grammar that has no left recursion, suitable for operations on by a top-down parsing method.

This will force immediate removal of epsilon-productions and unit-productions as well, as this algorithem only works on CFGs without those.

This is an implementation of Algorithm 4.19 from the purple dragon book, "Eliminating left recursion".

func (*CFG) RemoveRule added in v1.0.0

func (g *CFG) RemoveRule(nonterminal string)

RemoveRule eliminates all productions of the given nonterminal from the grammar. The nonterminal will no longer be considered to be a part of the Grammar.

If the grammar already does not contain the given non-terminal this function has no effect.

func (*CFG) RemoveTerm added in v1.0.0

func (g *CFG) RemoveTerm(t string)

RemoveTerm eliminates the given terminal from the grammar. The terminal will no longer be considered a valid symbol for a rule in the Grammar to produce.

If the grammar already does not contain the given nonterminal this function has no effect.

func (CFG) RemoveUnitProductions added in v1.0.0

func (g CFG) RemoveUnitProductions() CFG

RemoveUnitProductions returns a Grammar that derives strings equivalent to this one but with all unit production rules removed.

func (CFG) RemoveUnreachableNonTerminals added in v1.0.0

func (g CFG) RemoveUnreachableNonTerminals() CFG

RemoveUnreachableNonTerminals returns a grammar with all unreachable non-terminals removed.

func (*CFG) RemoveUnusedTerminals added in v1.0.0

func (g *CFG) RemoveUnusedTerminals()

RemoveUnusedTerminals removes all terminals that are not currently used by any rule in the grammar.

func (CFG) Rule added in v1.0.0

func (g CFG) Rule(nonterminal string) Rule

Rule returns the grammar rule for the given non-terminal symbol. If there is no rule defined for that nonterminal, a Rule with an empty NonTerminal field is returned; otherwise it will be the same string as the one passed in to the function.

func (CFG) StartSymbol added in v1.0.0

func (g CFG) StartSymbol() string

StartSymbol returns the defined start symbol for the grammar. If one is set in g.Start, that is returned, otherwise "S" is.

func (CFG) String added in v1.0.0

func (g CFG) String() string

String returns a string representation of the grammar.

func (CFG) Term added in v1.0.0

func (g CFG) Term(terminal string) lex.TokenClass

Term returns the TokenClass that the given terminal symbol maps to. If the symbol is not defined in this grammar, lex.TokenUndefined is returned.

func (CFG) TermFor added in v1.0.0

func (g CFG) TermFor(tc lex.TokenClass) string

TermFor returns the term used in the grammar to represent the given TokenClass. If tc is not a TokenClass in the grammar, "" is returned.

func (CFG) Terminals added in v1.0.0

func (g CFG) Terminals() []string

Terminals returns the set of terminal symbols in the grammar. Elements in the returned slice will be in a stable order, but it will not necessarily be related to the order they were added to the grammar.

func (CFG) UnitProductions added in v1.0.0

func (g CFG) UnitProductions() []Rule

UnitProductions returns all production rules that are of the form A -> B, where A and B are both non-terminals. The returned list contains rules mapping the non-terminal to the other non-terminal; all other productions from the grammar will not be present.

func (*CFG) UnmarshalBinary added in v1.0.0

func (g *CFG) UnmarshalBinary(data []byte) error

UnmarshalBinary decodes a slice of bytes created by MarshalBinary into g. All of g's fields will be replaced by the fields decoded from data.

func (CFG) UnreachableNonTerminals added in v1.0.0

func (g CFG) UnreachableNonTerminals() []string

UnreachableNonTerminals returns all non-terminals (excluding the start symbol) that are currently unreachable due to not being produced by any other grammar rule.

func (CFG) Validate added in v1.0.0

func (g CFG) Validate() error

Validates that the current rules form a complete grammar with no missing definitions. TODO: should also dupe-check rules.

type LR0Item

type LR0Item struct {
	// NonTerminal is the symbol at the head of the grammar rule that this item
	// is for.
	NonTerminal string

	// Left is all symbols in the production of the rule that are to the left of
	// the dot.
	Left []string

	// Right is all symbols in the production of the rule that are to the right
	// of the dot.
	Right []string
}

LR0Item is an LR(0) item from a grammar. This is an object used for generating handle recognition systems for bottom-up parsers. Each LR0Item has a NonTerminal of the rule it relates to, and a left and right side of a dot. The dot denotes handle recognition; all symbols on the left have been seen, and all symbols on the right have yet to be seen.

func MustParseLR0Item

func MustParseLR0Item(s string) LR0Item

MustParseLR0Item is identical to ]ParseLR0Item], but panics if any parse error occurs.

func ParseLR0Item

func ParseLR0Item(s string) (LR0Item, error)

ParseLR1Item parses a string of the form "NONTERM -> ALPHA.BETA" into an LR0Item.

func (LR0Item) Equal

func (lr0 LR0Item) Equal(o any) bool

Equal returns whether the given LR0Item is equal to another LR0Item or *LR0Item.

func (LR0Item) MarshalBinary added in v0.3.0

func (lr0 LR0Item) MarshalBinary() ([]byte, error)

MarshalBinary converts lr0 into a slice of bytes that can be decoded with UnmarshalBinary.

func (LR0Item) String

func (item LR0Item) String() string

String returns the string representation of an LR0 Item.

func (*LR0Item) UnmarshalBinary added in v0.3.0

func (lr0 *LR0Item) UnmarshalBinary(data []byte) error

UnmarshalBinary decodes a slice of bytes created by MarshalBinary into lr0. All of lr0's fields will be replaced by the fields decoded from data.

type LR1Item

type LR1Item struct {
	LR0Item
	Lookahead string
}

LR1Item is an LR(1) item from a grammar. This is the same as an LR0Item, but with a Lookahead symbol that determines when the item can be used.

func MustParseLR1Item

func MustParseLR1Item(s string) LR1Item

MustParseLR1Item is identical to ParseLR1Item, but panics if any parse error occurs.

func ParseLR1Item

func ParseLR1Item(s string) (LR1Item, error)

ParseLR1Item parses a string of the form "NONTERM -> ALPHA.BETA, LOOKAHEAD" into an LR1Item.

func (LR1Item) Copy

func (lr1 LR1Item) Copy() LR1Item

Copy returns a deep copy of the LR1Item.

func (LR1Item) Equal

func (lr1 LR1Item) Equal(o any) bool

Equal returns whether the given LR0Item is equal to another LR1Item or *LR1Item.

func (LR1Item) MarshalBinary added in v0.3.0

func (lr1 LR1Item) MarshalBinary() ([]byte, error)

MarshalBinary converts lr1 into a slice of bytes that can be decoded with UnmarshalBinary.

func (LR1Item) String

func (item LR1Item) String() string

String returns the string representation of an LR1 Item.

func (*LR1Item) UnmarshalBinary added in v0.3.0

func (lr1 *LR1Item) UnmarshalBinary(data []byte) error

UnmarshalBinary decodes a slice of bytes created by MarshalBinary into lr1. All of lr1's fields will be replaced by the fields decoded from data.

type Production

type Production []string

Production is a production of a grammar rule.

func (Production) Copy

func (p Production) Copy() Production

Copy returns a deep-copied duplicate of this production.

func (Production) Equal

func (p Production) Equal(o any) bool

Equal returns whether Rule is equal to another value. It will not be equal if the other value cannot be cast to Production or *Production.

func (Production) HasSymbol

func (p Production) HasSymbol(sym string) bool

HasSymbol returns whether the production has the given symbol in it.

func (Production) IsUnit

func (p Production) IsUnit() bool

IsUnit returns whether this production is a unit production.

func (Production) MarshalBinary added in v0.3.0

func (p Production) MarshalBinary() ([]byte, error)

MarshalBinary converts p into a slice of bytes that can be decoded with UnmarshalBinary.

func (Production) String

func (p Production) String() string

String returns the string representation of the Production.

func (*Production) UnmarshalBinary added in v0.3.0

func (p *Production) UnmarshalBinary(data []byte) error

UnmarshalBinary decodes a slice of bytes created by MarshalBinary into p. All of p's fields will be replaced by the fields decoded from data.

type Rule

type Rule struct {
	NonTerminal string
	Productions []Production
}

Rule is a set of derivation rules of a grammar for some NonTerminal.

func MustParseRule added in v0.8.0

func MustParseRule(r string) Rule

MustParseRule is like parseRule but panics if it can't.

func ParseRule added in v0.8.0

func ParseRule(r string) (Rule, error)

ParseRule parses a Rule from a string like "S -> X | Y".

func (Rule) CanProduce

func (r Rule) CanProduce(p Production) bool

CanProduce returns whether this rule can produce the given Production.

func (Rule) CanProduceSymbol

func (r Rule) CanProduceSymbol(termOrNonTerm string) bool

CanProduceSymbol whether any alternative in productions produces the given term/non-terminal

func (Rule) Copy

func (r Rule) Copy() Rule

Copy returns a deep-copy duplicate of the given Rule.

func (Rule) Equal

func (r Rule) Equal(o any) bool

Equal returns whether Rule is equal to another value. It will not be equal if the other value cannot be casted to a Rule or *Rule.

func (Rule) HasProduction

func (r Rule) HasProduction(prod Production) bool

HasProduction returns whether the rule has a production of the exact sequence of symbols entirely.

func (Rule) MarshalBinary added in v0.3.0

func (r Rule) MarshalBinary() ([]byte, error)

MarshalBinary converts r into a slice of bytes that can be decoded with UnmarshalBinary.

func (Rule) String

func (r Rule) String() string

String returns the string representation of a rule.

func (Rule) UnitProductions

func (r Rule) UnitProductions() []Production

UnitProductions returns all productions from the Rule that are unit productions; i.e. are of the form A -> B where both A and B are non-terminals.

func (*Rule) UnmarshalBinary added in v0.3.0

func (r *Rule) UnmarshalBinary(data []byte) error

UnmarshalBinary decodes a slice of bytes created by MarshalBinary into r. All of r's fields will be replaced by the fields decoded from data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL