Documentation ¶
Overview ¶
Package grammar implements context-free grammars and associated constructs. A context-free grammar describes a language using a series of production rules. They are often represented using BNF, which looks something like this:
SUM ::= EXPR '+' EXPR EXPR ::= int | identifier
This package provides CFG for representing and manipulating such a grammar. In ictiobus they are used to automatically generate parsers by querying a CFG for information on its rules.
Index ¶
- Variables
- type CFG
- func (g *CFG) AddRule(nonterminal string, production []string)
- func (g *CFG) AddTerm(terminal string, class lex.TokenClass)
- func (g CFG) Augmented() CFG
- func (g CFG) Copy() CFG
- func (g CFG) GenerateUniqueName(original string) string
- func (g CFG) GenerateUniqueTerminal(original string) string
- func (g CFG) HasUnreachableNonTerminals() bool
- func (g CFG) IsNonTerminal(sym string) bool
- func (g CFG) IsTerminal(sym string) bool
- func (g CFG) LR0Items() []LR0Item
- func (g CFG) LeftFactor() CFG
- func (g CFG) MarshalBinary() ([]byte, error)
- func (g CFG) NonTerminals() []string
- func (g CFG) NonTerminalsByPriority() []string
- func (g CFG) NonTerminalsByReversePriority() []string
- func (g CFG) ReachableFrom(start string, end string) (bool, []Rule)
- func (g CFG) RemoveEpsilons() CFG
- func (g CFG) RemoveLeftRecursion() CFG
- func (g *CFG) RemoveRule(nonterminal string)
- func (g *CFG) RemoveTerm(t string)
- func (g CFG) RemoveUnitProductions() CFG
- func (g CFG) RemoveUnreachableNonTerminals() CFG
- func (g *CFG) RemoveUnusedTerminals()
- func (g CFG) Rule(nonterminal string) Rule
- func (g CFG) StartSymbol() string
- func (g CFG) String() string
- func (g CFG) Term(terminal string) lex.TokenClass
- func (g CFG) TermFor(tc lex.TokenClass) string
- func (g CFG) Terminals() []string
- func (g CFG) UnitProductions() []Rule
- func (g *CFG) UnmarshalBinary(data []byte) error
- func (g CFG) UnreachableNonTerminals() []string
- func (g CFG) Validate() error
- type LR0Item
- type LR1Item
- type Production
- func (p Production) Copy() Production
- func (p Production) Equal(o any) bool
- func (p Production) HasSymbol(sym string) bool
- func (p Production) IsUnit() bool
- func (p Production) MarshalBinary() ([]byte, error)
- func (p Production) String() string
- func (p *Production) UnmarshalBinary(data []byte) error
- type Rule
- func (r Rule) CanProduce(p Production) bool
- func (r Rule) CanProduceSymbol(termOrNonTerm string) bool
- func (r Rule) Copy() Rule
- func (r Rule) Equal(o any) bool
- func (r Rule) HasProduction(prod Production) bool
- func (r Rule) MarshalBinary() ([]byte, error)
- func (r Rule) String() string
- func (r Rule) UnitProductions() []Production
- func (r *Rule) UnmarshalBinary(data []byte) error
Constants ¶
This section is empty.
Variables ¶
var ( // Epsilon represents the empty production. Epsilon = Production{""} // Error is a production used to indicate an error. Error = Production{} )
Functions ¶
This section is empty.
Types ¶
type CFG ¶ added in v1.0.0
type CFG struct { // Start is the name of the start symbol. If not set, It is assumed to be S. Start string // contains filtered or unexported fields }
CFG is a context-free grammar for a language. It holds the rules and derivations that make up the grammar, and maintains the terminal and non-terminal symbols that are in it in the order they are defined.
The zero-value is a CFG ready to be used.
func Parse ¶
Parse parses a 'grammar string' into a Grammar object. The string must have a semicolon between rules, spaces between each symbol, non-terminals must contain at least one upper-case letter. Epsilon "ε" is used for the epsilon production.
Example:
S -> A | B ; A -> a | ε ; B -> A b | c ;
func (*CFG) AddRule ¶ added in v1.0.0
AddRule adds the given production for a nonterminal. If the nonterminal has already been given, the production is added as an alternative for that nonterminal with lower priority than all others already added.
All rules require at least one symbol in the production. For episilon production, give only the empty string.
func (*CFG) AddTerm ¶ added in v1.0.0
func (g *CFG) AddTerm(terminal string, class lex.TokenClass)
AddTerm adds the given terminal along with the TokenClass that corresponds to it; tokens must be of that class in order to match the terminal.
The mapping of terminal symbol IDs to TokenClasses must be 1-to-1; i.e. It is an error to map multiple terms to the same TokenClass, and it is an error to map the same term to multiple TokenClasses.
As a result, redefining the same term will cause the old one to be removed, and during validation if multiple terminals are matched to the same tokenClass it will be considered an error.
It is an error to map any terminal to lex.TokenUndefined or lex.TokenEndOfText and attempting to do so will cause a panic.
func (CFG) Augmented ¶ added in v1.0.0
Augmented returns a new grammar that is a copy of this one but with the start symbol S changed to a new rule, S' -> S. All other rules are kept the same.
func (CFG) GenerateUniqueName ¶ added in v1.0.0
GenerateUniqueName generates a name for a non-terminal gauranteed to be unique within the grammar, based on original if one is provided.
func (CFG) GenerateUniqueTerminal ¶ added in v1.0.0
GenerateUniqueTerminal generates a name for a terminal gauranteed to be unique within the grammar, based on the given original if one is provided.
func (CFG) HasUnreachableNonTerminals ¶ added in v1.0.0
HasUnreachables returns whether the grammar currently has unreachle non-terminals.
func (CFG) IsNonTerminal ¶ added in v1.0.0
IsNonTerminal returns whether the given symbol is a non-terminal used in the CFG.
func (CFG) IsTerminal ¶ added in v1.0.0
IsTerminal returns whether the given symbol is a terminal used in the CFG.
func (CFG) LR0Items ¶ added in v1.0.0
LR0Items returns all LR0 items of rules in the grammar. These items can be used by LR parser generator algorithms to create a parsing DFA for the grammar.
func (CFG) LeftFactor ¶ added in v1.0.0
LeftFactor returns a new Grammar equivalent to this one but with all unclear alternative choices for a top-down parser are left factored to equivalent pairs of statements.
This is an implementation of Algorithm 4.21 from the purple dragon book, "Left factoring a grammar".
func (CFG) MarshalBinary ¶ added in v1.0.0
MarshalBinary converts g into a slice of bytes that can be decoded with UnmarshalBinary.
func (CFG) NonTerminals ¶ added in v1.0.0
NonTerminals returns a list of all the non-terminal symbols in the grammar. Elements in the returned slice will be in a stable order, but necessarily the same order as they were added in. All will be upper-case.
func (CFG) NonTerminalsByPriority ¶ added in v1.0.0
NonTerminalsByPriority returns list of all the non-terminal symbols in the order they were defined in. All will be upper-case.
func (CFG) NonTerminalsByReversePriority ¶ added in v1.0.0
NonTerminalsByReversePriority returns list of all the non-terminal symbols in reverse order from the order they were defined in. This is handy because it can have the effect of causing iteration to do so in a manner that a human might do looking at a grammar, reversed.
func (CFG) ReachableFrom ¶ added in v1.0.0
ReachableFrom returns whether the symbol end can be derived from symbol start with any combination of derivations. Returns whether it can be reached, and if it can, additional returns a set of Rules that each contain exactly one production; each is a derivation that must be done to a symbol of the prior derived to string to ultimately end up at end.
func (CFG) RemoveEpsilons ¶ added in v1.0.0
RemoveEpsilons returns a grammar that derives strings equivalent to the first one (with the exception of the empty string) but with all epsilon productions automatically eliminated.
Call Validate before this or it may go poorly.
func (CFG) RemoveLeftRecursion ¶ added in v1.0.0
RemoveLeftRecursion returns a grammar that has no left recursion, suitable for operations on by a top-down parsing method.
This will force immediate removal of epsilon-productions and unit-productions as well, as this algorithem only works on CFGs without those.
This is an implementation of Algorithm 4.19 from the purple dragon book, "Eliminating left recursion".
func (*CFG) RemoveRule ¶ added in v1.0.0
RemoveRule eliminates all productions of the given nonterminal from the grammar. The nonterminal will no longer be considered to be a part of the Grammar.
If the grammar already does not contain the given non-terminal this function has no effect.
func (*CFG) RemoveTerm ¶ added in v1.0.0
RemoveTerm eliminates the given terminal from the grammar. The terminal will no longer be considered a valid symbol for a rule in the Grammar to produce.
If the grammar already does not contain the given nonterminal this function has no effect.
func (CFG) RemoveUnitProductions ¶ added in v1.0.0
RemoveUnitProductions returns a Grammar that derives strings equivalent to this one but with all unit production rules removed.
func (CFG) RemoveUnreachableNonTerminals ¶ added in v1.0.0
RemoveUnreachableNonTerminals returns a grammar with all unreachable non-terminals removed.
func (*CFG) RemoveUnusedTerminals ¶ added in v1.0.0
func (g *CFG) RemoveUnusedTerminals()
RemoveUnusedTerminals removes all terminals that are not currently used by any rule in the grammar.
func (CFG) Rule ¶ added in v1.0.0
Rule returns the grammar rule for the given non-terminal symbol. If there is no rule defined for that nonterminal, a Rule with an empty NonTerminal field is returned; otherwise it will be the same string as the one passed in to the function.
func (CFG) StartSymbol ¶ added in v1.0.0
StartSymbol returns the defined start symbol for the grammar. If one is set in g.Start, that is returned, otherwise "S" is.
func (CFG) Term ¶ added in v1.0.0
func (g CFG) Term(terminal string) lex.TokenClass
Term returns the TokenClass that the given terminal symbol maps to. If the symbol is not defined in this grammar, lex.TokenUndefined is returned.
func (CFG) TermFor ¶ added in v1.0.0
func (g CFG) TermFor(tc lex.TokenClass) string
TermFor returns the term used in the grammar to represent the given TokenClass. If tc is not a TokenClass in the grammar, "" is returned.
func (CFG) Terminals ¶ added in v1.0.0
Terminals returns the set of terminal symbols in the grammar. Elements in the returned slice will be in a stable order, but it will not necessarily be related to the order they were added to the grammar.
func (CFG) UnitProductions ¶ added in v1.0.0
UnitProductions returns all production rules that are of the form A -> B, where A and B are both non-terminals. The returned list contains rules mapping the non-terminal to the other non-terminal; all other productions from the grammar will not be present.
func (*CFG) UnmarshalBinary ¶ added in v1.0.0
UnmarshalBinary decodes a slice of bytes created by MarshalBinary into g. All of g's fields will be replaced by the fields decoded from data.
func (CFG) UnreachableNonTerminals ¶ added in v1.0.0
UnreachableNonTerminals returns all non-terminals (excluding the start symbol) that are currently unreachable due to not being produced by any other grammar rule.
type LR0Item ¶
type LR0Item struct { // NonTerminal is the symbol at the head of the grammar rule that this item // is for. NonTerminal string // Left is all symbols in the production of the rule that are to the left of // the dot. Left []string // Right is all symbols in the production of the rule that are to the right // of the dot. Right []string }
LR0Item is an LR(0) item from a grammar. This is an object used for generating handle recognition systems for bottom-up parsers. Each LR0Item has a NonTerminal of the rule it relates to, and a left and right side of a dot. The dot denotes handle recognition; all symbols on the left have been seen, and all symbols on the right have yet to be seen.
func MustParseLR0Item ¶
MustParseLR0Item is identical to ]ParseLR0Item], but panics if any parse error occurs.
func ParseLR0Item ¶
ParseLR1Item parses a string of the form "NONTERM -> ALPHA.BETA" into an LR0Item.
func (LR0Item) Equal ¶
Equal returns whether the given LR0Item is equal to another LR0Item or *LR0Item.
func (LR0Item) MarshalBinary ¶ added in v0.3.0
MarshalBinary converts lr0 into a slice of bytes that can be decoded with UnmarshalBinary.
func (*LR0Item) UnmarshalBinary ¶ added in v0.3.0
UnmarshalBinary decodes a slice of bytes created by MarshalBinary into lr0. All of lr0's fields will be replaced by the fields decoded from data.
type LR1Item ¶
LR1Item is an LR(1) item from a grammar. This is the same as an LR0Item, but with a Lookahead symbol that determines when the item can be used.
func MustParseLR1Item ¶
MustParseLR1Item is identical to ParseLR1Item, but panics if any parse error occurs.
func ParseLR1Item ¶
ParseLR1Item parses a string of the form "NONTERM -> ALPHA.BETA, LOOKAHEAD" into an LR1Item.
func (LR1Item) Equal ¶
Equal returns whether the given LR0Item is equal to another LR1Item or *LR1Item.
func (LR1Item) MarshalBinary ¶ added in v0.3.0
MarshalBinary converts lr1 into a slice of bytes that can be decoded with UnmarshalBinary.
func (*LR1Item) UnmarshalBinary ¶ added in v0.3.0
UnmarshalBinary decodes a slice of bytes created by MarshalBinary into lr1. All of lr1's fields will be replaced by the fields decoded from data.
type Production ¶
type Production []string
Production is a production of a grammar rule.
func (Production) Copy ¶
func (p Production) Copy() Production
Copy returns a deep-copied duplicate of this production.
func (Production) Equal ¶
func (p Production) Equal(o any) bool
Equal returns whether Rule is equal to another value. It will not be equal if the other value cannot be cast to Production or *Production.
func (Production) HasSymbol ¶
func (p Production) HasSymbol(sym string) bool
HasSymbol returns whether the production has the given symbol in it.
func (Production) IsUnit ¶
func (p Production) IsUnit() bool
IsUnit returns whether this production is a unit production.
func (Production) MarshalBinary ¶ added in v0.3.0
func (p Production) MarshalBinary() ([]byte, error)
MarshalBinary converts p into a slice of bytes that can be decoded with UnmarshalBinary.
func (Production) String ¶
func (p Production) String() string
String returns the string representation of the Production.
func (*Production) UnmarshalBinary ¶ added in v0.3.0
func (p *Production) UnmarshalBinary(data []byte) error
UnmarshalBinary decodes a slice of bytes created by MarshalBinary into p. All of p's fields will be replaced by the fields decoded from data.
type Rule ¶
type Rule struct { NonTerminal string Productions []Production }
Rule is a set of derivation rules of a grammar for some NonTerminal.
func MustParseRule ¶ added in v0.8.0
MustParseRule is like parseRule but panics if it can't.
func (Rule) CanProduce ¶
func (r Rule) CanProduce(p Production) bool
CanProduce returns whether this rule can produce the given Production.
func (Rule) CanProduceSymbol ¶
CanProduceSymbol whether any alternative in productions produces the given term/non-terminal
func (Rule) Equal ¶
Equal returns whether Rule is equal to another value. It will not be equal if the other value cannot be casted to a Rule or *Rule.
func (Rule) HasProduction ¶
func (r Rule) HasProduction(prod Production) bool
HasProduction returns whether the rule has a production of the exact sequence of symbols entirely.
func (Rule) MarshalBinary ¶ added in v0.3.0
MarshalBinary converts r into a slice of bytes that can be decoded with UnmarshalBinary.
func (Rule) UnitProductions ¶
func (r Rule) UnitProductions() []Production
UnitProductions returns all productions from the Rule that are unit productions; i.e. are of the form A -> B where both A and B are non-terminals.
func (*Rule) UnmarshalBinary ¶ added in v0.3.0
UnmarshalBinary decodes a slice of bytes created by MarshalBinary into r. All of r's fields will be replaced by the fields decoded from data.