Documentation ¶
Overview ¶
Package lr implements prerequisites for LR parsing. It is mainly intended for Markdown parsing, but may be of use for other purposes, too.
Building a Grammar ¶
Grammars are specified using a grammar builder object. Clients add rules, consisting of non-terminal symbols and terminals. Terminals carry a token value of type int. Grammars may contain epsilon-productions.
Example:
b := lr.NewGrammarBuilder("G") b.LHS("S").N("A").T("a", 1).EOF() // S -> A a EOF b.LHS("A").N("B").N("D").End() // A -> B D b.LHS("B").T("b", 2).End() // B -> b b.LHS("B").Epsilon() // B -> b.LHS("D").T("d", 3).End() // D -> d b.LHS("D").Epsilon() // D ->
This results in the following trivial grammar:
b.Grammar().Dump() 0: [S] ::= [A a #eof] 1: [A] ::= [B D] 2: [B] ::= [b] 3: [B] ::= [] 4: [D] ::= [d] 5: [D] ::= []
Static Grammar Analysis ¶
After the grammar is complete, it has to be analysed. For this end, the grammar is subjected to an LRAnalysis object, which computes FIRST and FOLLOW sets for the grammar and determines all epsilon-derivable rules.
Although FIRST and FOLLOW-sets are mainly intended to be used for internal purposes of constructing the parser tables, methods for getting FIRST(N) and FOLLOW(N) of non-terminals are defined to be public.
ga := lr.Analysis(g) // analyser for grammar above ga.Grammar().EachNonTerminal( func(name string, N Symbol) interface{} { // ad-hoc mapper function fmt.Printf("FIRST(%s) = %v", name, ga.First(N)) // get FIRST-set for N return nil }) // Output: FIRST(S) = [1 2 3] // terminal token values as int, 1 = 'a' FIRST(A) = [0 2 3] // 0 = epsilon FIRST(B) = [0 2] // 2 = 'b' FIRST(D) = [0 3] // 3 = 'd'
Parser Construction ¶
Using grammar analysis as input, a bottom-up parser can be constructed. First a characteristic finite state machine (CFSM) is built from the grammar. The CFSM will then be transformed into a GOTO table (LR(0)-table) and an ACTION table for a SLR(1) parser. The CFSM will not be thrown away, but is made available to the client. This is intended for debugging purposes, but may be useful for error recovery, too. It can be exported to Graphviz's Dot-format.
Example:
lrgen := NewLRTableGenerator(ga) // ga is a GrammarAnalysis, see above lrgen.CreateTables() // construct LR parser tables
BSD License ¶
Copyright (c) 2017–2020, Norbert Pillmayer ¶
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of this software nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index ¶
- Constants
- Variables
- func ActionTableAsHTML(lrgen *TableGenerator, w io.Writer)
- func Dump(iset *iteratable.Set)
- func GotoTableAsHTML(lrgen *TableGenerator, w io.Writer)
- func StartItem(r *Rule) (Item, *Symbol)
- func T() tracing.Trace
- type CFSM
- type CFSMState
- type Grammar
- func (g *Grammar) Dump()
- func (g *Grammar) EachNonTerminal(mapper func(sym *Symbol) interface{}) []interface{}
- func (g *Grammar) EachSymbol(mapper func(sym *Symbol) interface{}) []interface{}
- func (g *Grammar) EachTerminal(mapper func(sym *Symbol) interface{}) []interface{}
- func (g *Grammar) FindNonTermRules(sym *Symbol, includeEpsRules bool) *iteratable.Set
- func (g *Grammar) Rule(no int) *Rule
- func (g *Grammar) Size() int
- func (g *Grammar) SymbolByName(name string) *Symbol
- func (g *Grammar) Terminal(tokval int) *Symbol
- type GrammarBuilder
- type Item
- type LRAnalysis
- type Rule
- type RuleBuilder
- func (rb *RuleBuilder) AppendSymbol(sym *Symbol) *RuleBuilder
- func (rb *RuleBuilder) EOF() *Rule
- func (rb *RuleBuilder) End() *Rule
- func (rb *RuleBuilder) Epsilon() *Rule
- func (rb *RuleBuilder) L(s string) *RuleBuilder
- func (rb *RuleBuilder) N(s string) *RuleBuilder
- func (rb *RuleBuilder) T(s string, tokval int) *RuleBuilder
- type Span
- type Symbol
- type TableGenerator
- func (lrgen *TableGenerator) AcceptingStates() []int
- func (lrgen *TableGenerator) ActionTable() *sparse.IntMatrix
- func (lrgen *TableGenerator) BuildGotoTable() *sparse.IntMatrix
- func (lrgen *TableGenerator) BuildLR0ActionTable() (*sparse.IntMatrix, bool)
- func (lrgen *TableGenerator) BuildSLR1ActionTable() (*sparse.IntMatrix, bool)
- func (lrgen *TableGenerator) CFSM() *CFSM
- func (lrgen *TableGenerator) CreateTables()
- func (lrgen *TableGenerator) GotoTable() *sparse.IntMatrix
- type TokenizerHook
Constants ¶
const ( EpsilonType = 0 EOFType = -1 // pseudo terminal token for end of input NonTermType = -1000 // IDs of terminals MUST be in { -2 … -999 } )
Symbols' values must adhere to these ranges.
const ( ShiftAction = -1 AcceptAction = -2 )
Actions for parser action tables.
Variables ¶
var NullItem = Item{nil, 0, 0}
NullItem is the invalid item.
Functions ¶
func ActionTableAsHTML ¶
func ActionTableAsHTML(lrgen *TableGenerator, w io.Writer)
ActionTableAsHTML exports the SLR(1) ACTION-table in HTML-format.
func GotoTableAsHTML ¶
func GotoTableAsHTML(lrgen *TableGenerator, w io.Writer)
GotoTableAsHTML exports a GOTO-table in HTML-format.
Types ¶
type CFSM ¶
type CFSM struct { S0 *CFSMState // start state // contains filtered or unexported fields }
CFSM is the characteristic finite state machine for a LR grammar, i.e. the LR(0) state diagram. Will be constructed by a TableGenerator. Clients normally do not use it directly. Nevertheless, there are some methods defined on it, e.g, for debugging purposes, or even to compute your own tables from it.
func (*CFSM) CFSM2GraphViz ¶
CFSM2GraphViz exports a CFSM to the Graphviz Dot format, given a filename.
type CFSMState ¶
type CFSMState struct { ID int // serial ID of this state Accept bool // is this an accepting state? // contains filtered or unexported fields }
CFSMState is a state within the CFSM for a grammar.
type Grammar ¶
type Grammar struct { Name string // a grammar has a name, for documentation only Epsilon *Symbol // a special symbol representing epsilon EOF *Symbol // a special symbol representing end of input // contains filtered or unexported fields }
Grammar is a type for a grammar. Usually created using a GrammarBuilder.
func (*Grammar) Dump ¶
func (g *Grammar) Dump()
Dump is a debugging helper: dump symbols and rules to stdout.
func (*Grammar) EachNonTerminal ¶
EachNonTerminal iterates over all non-terminal symbols of the grammar. Return values of the mapper function for all non-terminals are returned as an array.
func (*Grammar) EachSymbol ¶
EachSymbol iterates over all symbols of the grammar. Return values of the mapper function are returned as an array.
func (*Grammar) EachTerminal ¶
EachTerminal iterates over all terminals of the grammar. Return values of the mapper function for all terminals are returned as an array.
func (*Grammar) FindNonTermRules ¶
func (g *Grammar) FindNonTermRules(sym *Symbol, includeEpsRules bool) *iteratable.Set
FindNonTermRules returns a set of Earley-items, where each item stems from a rule with a given LHS and the dot is at position 0.
func (*Grammar) SymbolByName ¶
SymbolByName gets a symbol for a given name, if found in the grammar.
type GrammarBuilder ¶
type GrammarBuilder struct {
// contains filtered or unexported fields
}
A GrammarBuilder is used to construct a Grammar.
b := NewGrammarBuilder("G") b.LHS("S").N("A").T("a", 1).End() // S -> A a b.LHS("A").N("B").N("D").End() // A -> B D b.LHS("B").T("b", 2).End() // B -> b b.LHS("B").Epsilon() // B -> b.LHS("D").T("d", 3).End() // D -> d b.LHS("D").Epsilon() // D ->
This results in the following grammar: b.Grammar().Dump()
0: [S']::= [S] 1: [S] ::= [A a] 2: [A] ::= [B D] 3: [B] ::= [b] 4: [B] ::= [] 5: [D] ::= [d] 6: [D] ::= []
A call to b.Grammar() returns the (completed) grammar.
func NewGrammarBuilder ¶
func NewGrammarBuilder(gname string) *GrammarBuilder
NewGrammarBuilder gets a new grammar builder, given the name of the grammar to build.
func (*GrammarBuilder) Grammar ¶
func (gb *GrammarBuilder) Grammar() (*Grammar, error)
Grammar returns the (completed) grammar.
func (*GrammarBuilder) LHS ¶
func (gb *GrammarBuilder) LHS(s string) *RuleBuilder
LHS starts a rule given the left hand side symbol (non-terminal).
func (*GrammarBuilder) SetTokenizerHook ¶
func (gb *GrammarBuilder) SetTokenizerHook(hook TokenizerHook)
SetTokenizerHook sets a tokenizer hook, which will be called by the grammar to produce terminal tokens.
type Item ¶
type Item struct { Origin uint64 // start position in input // contains filtered or unexported fields }
Item is an Earley item.
func (Item) Advance ¶
Advance advances the dot of an item over the next symbol. Returns NullItem if the dot is already past the last symbol.
func (Item) PeekSymbol ¶
PeekSymbol returns the symbol after the dot, if any.
type LRAnalysis ¶
type LRAnalysis struct {
// contains filtered or unexported fields
}
LRAnalysis is an object for grammar analysis (compute FIRST and FOLLOW sets).
func Analysis ¶
func Analysis(g *Grammar) *LRAnalysis
Analysis creates an analyser for a grammar. The analyser immediately starts its work and computes FIRST and FOLLOW sets.
func (*LRAnalysis) CleanUp ¶
func (ga *LRAnalysis) CleanUp() error
CleanUp cleans a context free grammar by removing unproductive non-terminals and rules, and by removing unreachable non-terminals.
If the returned error is of kind ProductionsRemovedError, sub-errors are giving details about the removals. Check with errors.Is(…).
func (*LRAnalysis) DerivesEpsilon ¶
func (ga *LRAnalysis) DerivesEpsilon(sym *Symbol) bool
DerivesEpsilon returns true if there are rules in the grammar which let a given symbol sym be derived to epsilon.
func (*LRAnalysis) First ¶
func (ga *LRAnalysis) First(sym *Symbol) *intsets.Sparse
First returns the FIRST set for a non-terminal. Returns a set of token values.
func (*LRAnalysis) Follow ¶
func (ga *LRAnalysis) Follow(sym *Symbol) *intsets.Sparse
Follow returns the FOLLOW set for a non-terminal. Returns a set of token values.
func (*LRAnalysis) Grammar ¶
func (ga *LRAnalysis) Grammar() *Grammar
Grammar retunrns the grammer this analyser operates on.
type Rule ¶
type Rule struct { Serial int // order number of this rule within a grammar LHS *Symbol // symbols of left hand side // contains filtered or unexported fields }
A Rule is a type for rules of a grammar. Rules cannot be shared between grammars.
type RuleBuilder ¶
type RuleBuilder struct {
// contains filtered or unexported fields
}
RuleBuilder is a builder type for rules.
func (*RuleBuilder) AppendSymbol ¶
func (rb *RuleBuilder) AppendSymbol(sym *Symbol) *RuleBuilder
AppendSymbol appends your own symbol objects to the builder to extend the RHS of a rule. Clients will have to make sure no different 2 symbols have the same ID and no symbol ID equals a token value of a non-terminal. This restriction is necessary to help produce correct GOTO tables for LR-parsing.
func (*RuleBuilder) EOF ¶
func (rb *RuleBuilder) EOF() *Rule
EOF appends EOF as a (terminal) symbol to a rule. This is usually not called by clients, but rather internally by the grammar builder. If you know what you're doing, be careful.
This completes the rule (no other builder calls should be made for this rule).
func (*RuleBuilder) End ¶
func (rb *RuleBuilder) End() *Rule
End a rule. This completes the rule (no other builder calls should be made for this rule).
func (*RuleBuilder) Epsilon ¶
func (rb *RuleBuilder) Epsilon() *Rule
Epsilon sets epsilon as the RHS of a production. This must be called directly after rb.LHS(...). It closes the rule, thus no call to End() or EOF() must follow.
func (*RuleBuilder) L ¶
func (rb *RuleBuilder) L(s string) *RuleBuilder
L appends a terminal/lexeme to the builder. This will create a symbol for a terminal, with a token value > -1000. This is due to the convention of the stdlib-package text/parser, which uses token values > 0 for single-rune tokens and token values < 0 for common language elements like identifiers, strings, numbers, etc. (it is assumed that no symbol set will require more than 1000 of such language elements). The method call will panic if this restriction is violated.
The token value will either be generated from an internal sequence, or – if a tokenizer-hook is set – by the hook.
func (*RuleBuilder) N ¶
func (rb *RuleBuilder) N(s string) *RuleBuilder
N appends a non-terminal to the builder. The internal symbol created for the non-terminal will have an ID less than -1000.
func (*RuleBuilder) T ¶
func (rb *RuleBuilder) T(s string, tokval int) *RuleBuilder
T appends a terminal to the builder. The symbol created for the terminal must not have a token value <= -1000 and not have value 0 or -1. This is due to the convention of the stdlib-package text/parser, which uses token values > 0 for single-rune tokens and token values < 0 for common language elements like identifiers, strings, numbers, etc. (it is assumed that no symbol set will require more than 1000 of such language elements). The method call will panic if this restriction is violated.
type Span ¶
type Span [2]uint64 // (x…y)
Span is a small type for capturing a length of input token run. For every terminal and non-terminal, a parse tree/forest will track which input positions this symbol covers. A span denotes a start position and the position just behind the end.
type Symbol ¶
Symbol is a symbol type used for grammars and grammar builders.
func (*Symbol) IsTerminal ¶
IsTerminal returns true if this symbol represents a terminal.
type TableGenerator ¶
type TableGenerator struct { HasConflicts bool // contains filtered or unexported fields }
TableGenerator is a generator object to construct LR parser tables. Clients usually create a Grammar G, then a LRAnalysis-object for G, and then a table generator. TableGenerator.CreateTables() constructs the CFSM and parser tables for an LR-parser recognizing grammar G.
func NewTableGenerator ¶
func NewTableGenerator(ga *LRAnalysis) *TableGenerator
NewTableGenerator creates a new TableGenerator for a (previously analysed) grammar.
func (*TableGenerator) AcceptingStates ¶
func (lrgen *TableGenerator) AcceptingStates() []int
AcceptingStates returns all states of the CFSM which represent an accept action. Clients have to call CreateTables() first.
func (*TableGenerator) ActionTable ¶
func (lrgen *TableGenerator) ActionTable() *sparse.IntMatrix
ActionTable returns the ACTION table for LR-parsing a grammar. The tables have to be built by calling CreateTables() previously (or a separate call to BuildSLR1ActionTable(...).)
func (*TableGenerator) BuildGotoTable ¶
func (lrgen *TableGenerator) BuildGotoTable() *sparse.IntMatrix
BuildGotoTable builds the GOTO table. This is normally not called directly, but rather via CreateTables().
func (*TableGenerator) BuildLR0ActionTable ¶
func (lrgen *TableGenerator) BuildLR0ActionTable() (*sparse.IntMatrix, bool)
BuildLR0ActionTable contructs the LR(0) Action table. This method is not called by CreateTables(), as we normally use an SLR(1) parser and therefore an action table with lookahead included. This method is provided as an add-on.
func (*TableGenerator) BuildSLR1ActionTable ¶
func (lrgen *TableGenerator) BuildSLR1ActionTable() (*sparse.IntMatrix, bool)
BuildSLR1ActionTable constructs the SLR(1) Action table. This method is normally not called by clients, but rather via CreateTables(). It builds an action table including lookahead (using the FOLLOW-set created by the grammar analyzer).
func (*TableGenerator) CFSM ¶
func (lrgen *TableGenerator) CFSM() *CFSM
CFSM returns the characteristic finite state machine (CFSM) for a grammar. Usually clients call lrgen.CreateTables() beforehand, but it is possible to call lrgen.CFSM() directly. The CFSM will be created, if it has not been constructed previously.
func (*TableGenerator) CreateTables ¶
func (lrgen *TableGenerator) CreateTables()
CreateTables creates the necessary data structures for an SLR parser.
func (*TableGenerator) GotoTable ¶
func (lrgen *TableGenerator) GotoTable() *sparse.IntMatrix
GotoTable returns the GOTO table for LR-parsing a grammar. The tables have to be built by calling CreateTables() previously (or a separate call to BuildGotoTable(...).)
type TokenizerHook ¶
TokenizerHook is an interface for a hook function, which will produce properties for a valid terminal token for this grammar.
Directories ¶
Path | Synopsis |
---|---|
Package dss implements variants of a DAG-structured stack (DSS).
|
Package dss implements variants of a DAG-structured stack (DSS). |
Package earley provides an Earley-Parser.
|
Package earley provides an Earley-Parser. |
Package glr implements a small-scale GLR(1)-parser.
|
Package glr implements a small-scale GLR(1)-parser. |
Package iteratable implements iteratable container data structures.
|
Package iteratable implements iteratable container data structures. |
Package slr provides an SLR(1)-parser.
|
Package slr provides an SLR(1)-parser. |
Package sparse implements a simple type for sparse integer matrices.
|
Package sparse implements a simple type for sparse integer matrices. |
Package sppf implements a "Shared Packed Parse Forest".
|
Package sppf implements a "Shared Packed Parse Forest". |