Documentation ¶
Overview ¶
Package items implements dotted items for FSA generation during the lexer generation process.
GENERALISED SUBSET ALGORITHM ¶
The problem: Lexers for some applications, e.g.: email header parsing have a number of char range input symbols, which overlap to a large extent.
The lexer symbol space can be very large, e.g.: UTF-8, so it is impractical to create items for every possible value of the char ranges.
Possible solutions:
1. Automatic conflict handling Impractical because of overlapping char ranges for each state.
2. Separate symbol classification functions for each state Better, but overlaps can still occur
3. Create a set of intersections of the valid char ranges of each set. Use the intersections as input symbols to the lexer. This allows the full state space to be generated.
DEFINITION: LexPart = (Tokens, RegDefs, TermSym)
ALGORITHM: Generate lexer sets
S0 := CreateSet0() itemSets = {S0} transitions = {} repeat { for set in itemSets { for symRange in set.SymRanges() { if nextSet := set.Next(symRange); nextSet not in itemSets { add nextSet to itemSets add (set, symRange, nextSet) to transitions } } } } until no more sets added to itemSets return itemSets, transitions
ALGORITHM: CreateSet0()
S0 = {} for all T : x in Tokens { add T : •x to S0 } return S0.Closure()
DEFINITION: Let R be the id of a regular definition.
ALGORITHM: set.Next()
Output: the set containing all items in set that can move over symRange. repeat nextSet = {} for I : x •c y in set and c is in symRange{ add I : x c •y to nextSet } for R : •z in nextSet and I : x •R y in set { add I : x •R y to nextSet } for R : z• in nextSet and I : x •R y in set { add I : x R• y to nextSet } until no items were added to nextSet return nextSet.Closure()
ALGORITHM: set.Closure()
closure = {X : x •y | X : x •y in set and y not null} repeat for item I : x •R y in set and R : •z in RegDefs { add R : •z to closure } until no more items were added to closure
Disjunct character ranges ¶
DEFINITIONS
The alphabet, S = [s[1],s[n]], of a regular grammar is an ordered set of n symbols. Given any s[i] != s[n] of S the next symbol in S can be found by s[i] + 1. s[n] + 1 is undefined. For any s[i] != s[1] the previous symbol can be found by s[i] - 1. |S| is the size of S. Closed intervals of the alphabet, S, may be declared as symbol ranges, r = [r[1],r[k]] where the size of r, |r| = k. r is an ordered subset of S with n elements without gaps, i.e.: r[i] + 1 = r[i+1]. X : x •a y is an item of a regular grammar, with a the expected symbol and x, y possibly empty strings of symbols from S. In the algorithm symbols may be elements or ranges of S. If s is an element of S, |s| = 1. If |s| = 1, s = [s,s]
ALGORITHM: set.SymRanges() - Create the lexer symbol ranges of an item set, S:
Input: An item set of the grammar. Output: A set of non-overlapping symbol ranges. Let SR = {} be a initially empty set of symbol ranges. for all items i in S { let s be the expected symbol of i if { // s outside all r in SR case for all r in I, s[|s|] < r[1] or s[1] > r[|r|]: add [s[1],s[|s|] to I // s equal to some r in SR case r in SR and s[1] = r[1] and s[|s|] = r[|r|]: do nothing //s inside some r in SR case r in SR and r[1] = s[1] and s[|s|] < r[|r|]: replace r in SR by {[r[1],s[|s|]], [s[|s|]+1,r[|r|]]} case r in SR and r[1] < s[1] and s[|s|] < r[|r|]: replace r in SR by {[r[1],s[1]-1], [s[s[1],s[|s|]], s[|s]+1,r[|r|]]} case r in SR and r[1] < s[1] and s[|s|] = r[|r|]: repace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]]} // overlap to left case r in SR and s[1] < r[1] <= s[|s|] < r[|r|]: replace r in SR by {[s[1],r[1]-1], [r[1],s[|s|]], [s[|s|]+1,r[|r|]]} case r in SR and s[1] < r[1] and s[|s|] = r[|r|]: add [s[1],r[1]-1] to SR } // overlap to right case r in SR and s[1] = r[1] and s[|s|] > r[|r|]: add [r[|r|]+1,s[|s|]] to SR case r in SR and [s1] > r[1] and s[|s|] > r[|r|]: replace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]], [r[|r|]+1,s[|s|]]} }
Index ¶
- func WriteStringNode(w io.Writer, node ast.LexNode, pos *itemPos)
- type Accept
- type Action
- type CharRange
- type DisjunctRangeSet
- func (this *DisjunctRangeSet) AddLexTNode(sym ast.LexTNode)
- func (this *DisjunctRangeSet) AddRange(from, to rune)
- func (this *DisjunctRangeSet) List() []CharRange
- func (this *DisjunctRangeSet) Range(i int) CharRange
- func (this *DisjunctRangeSet) Size() int
- func (this *DisjunctRangeSet) String() string
- type Error
- type Ignore
- type Item
- func (this *Item) Clone() *Item
- func (this *Item) Emoves() (items []*Item)
- func (this *Item) Equal(that *Item) bool
- func (this *Item) ExpectedSymbol() (node ast.LexTNode)
- func (this *Item) HashKey() string
- func (this *Item) Move(rng CharRange) []*Item
- func (this *Item) MoveDot() []*Item
- func (this *Item) MoveRegDefId(id string) []*Item
- func (this *Item) Reduce() bool
- func (this *Item) String() string
- type ItemList
- func (this ItemList) AddExclusive(item *Item) (ItemList, error)
- func (this ItemList) AddNoDuplicate(items ...*Item) ItemList
- func (this ItemList) Closure(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList
- func (this ItemList) Contain(that *Item) bool
- func (this ItemList) ContainShift(id string) bool
- func (this ItemList) Equal(that ItemList) bool
- func (this ItemList) PrefixString(prefix string) string
- func (this ItemList) Remove(item *Item) ItemList
- type ItemSet
- func (this *ItemSet) Action() Action
- func (this *ItemSet) Add(items ...*Item)
- func (this *ItemSet) Contain(that *Item) bool
- func (this *ItemSet) Empty() bool
- func (this *ItemSet) Equal(items ItemList) bool
- func (this *ItemSet) Next(rng CharRange) ItemList
- func (this *ItemSet) NextDot() ItemList
- func (this *ItemSet) NextImport(imprt string) ItemList
- func (this *ItemSet) Size() int
- func (this *ItemSet) String() string
- func (this *ItemSet) StringItems() string
- type ItemSets
- func (this *ItemSets) Add(items ItemList) (setNo int)
- func (this *ItemSets) Closure() *ItemSets
- func (this *ItemSets) Contain(items ItemList) (yes bool, index int)
- func (this *ItemSets) List() []*ItemSet
- func (this *ItemSets) Size() int
- func (this *ItemSets) String() string
- func (this *ItemSets) Symbols() *symbols.Symbols
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type DisjunctRangeSet ¶
type DisjunctRangeSet struct { MatchAny bool // contains filtered or unexported fields }
set is kept sorted
func NewDisjunctRangeSet ¶
func NewDisjunctRangeSet() *DisjunctRangeSet
func (*DisjunctRangeSet) AddLexTNode ¶
func (this *DisjunctRangeSet) AddLexTNode(sym ast.LexTNode)
func (*DisjunctRangeSet) AddRange ¶
func (this *DisjunctRangeSet) AddRange(from, to rune)
|===========|
|-------| |===========| 1 |-----------+---|-------+ 2 |-----------+-----------| 3 |-----------+-----------+---| 4
|---|-------+ 5 |===========| 6 |-----------+---| 7 +===========+ |----| 8 +---|----|--+ 9 +---|-------| 10 +---|-------+---| 11
func (*DisjunctRangeSet) List ¶
func (this *DisjunctRangeSet) List() []CharRange
func (*DisjunctRangeSet) Range ¶
func (this *DisjunctRangeSet) Range(i int) CharRange
func (*DisjunctRangeSet) Size ¶
func (this *DisjunctRangeSet) Size() int
func (*DisjunctRangeSet) String ¶
func (this *DisjunctRangeSet) String() string
type Item ¶
type Item struct { Id string Prod ast.LexProduction ProdIndex ast.LexProdIndex *symbols.Symbols // contains filtered or unexported fields }
func NewItem ¶
NewItem will return
T : s
for a lex production,
T : •s
without executing the ℇ-moves for s.
Func *Item.Emoves must be called to return the set of basic items for T : •s.
func (*Item) Emoves ¶
"this" may be a non-basic item. The function returns the set of basic items after ℇ-moves have be executed on this.
For a general description of dotted items (items) and ℇ-moves of items, see:
Modern Compiler Design. Dick Grune, et al. Second Edition. Springer 2012.
ℇ-moves for lex items:
Lex T be any production. Let w,x,y,z be any strings of lexical symbols. Let s = x|...|y have one or more alternatives. Then T : •s => T : •x|...|z ... T : x|...|•z T : x•[y]z => T : x[•y]z T : x[y]•z T : x[y•]z => T : x[y]•z T : x•{y}z => T : x{•y}z T : x{y}•z T : x{y•}z => T : x{•y}z T : x{y}•z T : w•(x|...|y)z => T : w(•x|...|y)z ... T : w( x|...|•y)z T : x(...|y•|...)z => T : x(...|y|...)•z
func (*Item) ExpectedSymbol ¶
ExpectedSymbol returns the expected symbol for shift items and nil for reduce items. This is the position of a basic item -- no ℇ-moves are possible.
func (*Item) MoveRegDefId ¶
type ItemList ¶
type ItemList []*Item
Each Itemset element is a ItemList.
func NewItemList ¶
func (ItemList) AddNoDuplicate ¶
AddNoDuplicate will add any of the listed items that are not already in the list.
func (ItemList) ContainShift ¶
func (ItemList) PrefixString ¶
type ItemSet ¶
type ItemSet struct { Items ItemList *symbols.Symbols SymbolClasses *DisjunctRangeSet Transitions []int ImportTransitions []int DotTransition int // contains filtered or unexported fields }
Set of basic lexical items Each entry of Transitions is the next set for the corresponding symbo class. len(ItemSet.Transitions) == ItemSet.SymbolClasses.Size()
func NewItemSet ¶
func (*ItemSet) NextImport ¶
func (*ItemSet) StringItems ¶
type ItemSets ¶
type ItemSets struct {
// contains filtered or unexported fields
}