items

package

v1.0.1 Latest Latest Go to latest Published: May 14, 2024 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Desuuuu/gocc

Links

Open Source Insights

Documentation ¶

Overview ¶

Package items implements dotted items for FSA generation during the lexer generation process.

GENERALISED SUBSET ALGORITHM ¶

The problem: Lexers for some applications, e.g.: email header parsing have a number of char range input symbols, which overlap to a large extent.

The lexer symbol space can be very large, e.g.: UTF-8, so it is impractical to create items for every possible value of the char ranges.

Possible solutions:

1. Automatic conflict handling Impractical because of overlapping char ranges for each state.

2. Separate symbol classification functions for each state Better, but overlaps can still occur

3. Create a set of intersections of the valid char ranges of each set. Use the intersections as input symbols to the lexer. This allows the full state space to be generated.

DEFINITION: LexPart = (Tokens, RegDefs, TermSym)

ALGORITHM: Generate lexer sets

S0 := CreateSet0()
itemSets = {S0}
transitions = {}
repeat {
	for set in itemSets {
		for symRange in set.SymRanges() {
			if nextSet := set.Next(symRange); nextSet not in itemSets {
				add nextSet to itemSets
				add (set, symRange, nextSet) to transitions
			}
		}
	}
} until no more sets added to itemSets
return itemSets, transitions

ALGORITHM: CreateSet0()

S0 = {}
for all T : x in Tokens {
	add T : •x to S0
}
return S0.Closure()

DEFINITION: Let R be the id of a regular definition.

ALGORITHM: set.Next()

Output: the set containing all items in set that can move over symRange.

repeat
	nextSet = {}
	for I : x •c y in set and c is in symRange{
		add I : x c •y to nextSet
	}
	for R : •z in nextSet and I : x •R y in set {
		add I : x •R y to nextSet
	}
	for R : z• in nextSet and I : x •R y in set {
		add I : x R• y to nextSet
	}
until no items were added to nextSet

return nextSet.Closure()

ALGORITHM: set.Closure()

closure = {X : x •y | X : x •y in set and y not null}
repeat
	for item I : x •R y in set and R : •z in RegDefs {
		add R : •z to closure
	}
until no more items were added to closure

Disjunct character ranges ¶

DEFINITIONS

 	The alphabet, S = [s[1],s[n]], of a regular grammar is an ordered set of n symbols. Given any s[i] != s[n] of S the next symbol in S
 	can be found  	by s[i] + 1. s[n] + 1 is undefined. For any s[i] != s[1] the previous symbol can be found by s[i] - 1. |S| is the
 	size of S.

 	Closed intervals of the alphabet, S, may be declared as symbol ranges, r = [r[1],r[k]] where the size of r, |r| = k. r is an ordered
 	subset of S with n elements without gaps, i.e.: r[i] + 1 = r[i+1].

 	X : x •a y is an item  of a regular grammar, with a the expected symbol and x, y possibly empty strings of symbols from S.

	In the algorithm symbols may be elements or ranges of S. If s is an element of S, |s| = 1. If |s| = 1, s = [s,s]

ALGORITHM: set.SymRanges() - Create the lexer symbol ranges of an item set, S:

 	Input: An item set of the grammar.

 	Output: A set of non-overlapping symbol ranges.

	Let SR = {} be a initially empty set of symbol ranges.

	for all items i in S {
		let s be the expected symbol of i
		if {
		// s outside all r in SR
		case for all r in I, s[|s|] < r[1] or s[1] > r[|r|]:
			add [s[1],s[|s|] to I

		// s equal to some r in SR
		case r in SR and s[1] = r[1] and s[|s|] = r[|r|]:
			do nothing

		//s inside some r in SR
		case r in SR and r[1] = s[1] and s[|s|] < r[|r|]:
			replace r in SR by {[r[1],s[|s|]], [s[|s|]+1,r[|r|]]}
		case r in SR and r[1] < s[1] and s[|s|] < r[|r|]:
			replace r in SR by {[r[1],s[1]-1], [s[s[1],s[|s|]], s[|s]+1,r[|r|]]}
		case r in SR and r[1] < s[1] and s[|s|] = r[|r|]:
			repace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]]}

		// overlap to left
		case r in SR and s[1] < r[1] <= s[|s|] < r[|r|]:
			replace r in SR by {[s[1],r[1]-1], [r[1],s[|s|]], [s[|s|]+1,r[|r|]]}
		case r in SR and s[1] < r[1] and s[|s|] = r[|r|]:
			add [s[1],r[1]-1] to SR
		}

		// overlap to right
		case r in SR and s[1] = r[1] and s[|s|] > r[|r|]:
			add [r[|r|]+1,s[|s|]] to SR
		case r in SR and [s1] > r[1] and s[|s|] > r[|r|]:
			replace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]], [r[|r|]+1,s[|s|]]}
	}

Index ¶

func WriteStringNode(w io.Writer, node ast.LexNode, pos *itemPos)
type Accept
- func (this Accept) String() string
type Action
type CharRange
- func (this CharRange) Equal(that CharRange) bool
- func (this CharRange) String() string
type DisjunctRangeSet
- func NewDisjunctRangeSet() *DisjunctRangeSet
- func (this *DisjunctRangeSet) AddLexTNode(sym ast.LexTNode)
- func (this *DisjunctRangeSet) AddRange(from, to rune)
- func (this *DisjunctRangeSet) List() []CharRange
- func (this *DisjunctRangeSet) Range(i int) CharRange
- func (this *DisjunctRangeSet) Size() int
- func (this *DisjunctRangeSet) String() string
type Error
- func (this Error) String() string
type Ignore
- func (this Ignore) String() string
type Item
- func NewItem(prodId string, lexPart *ast.LexPart, symbols *symbols.Symbols) *Item
- func (this *Item) Clone() *Item
- func (this *Item) Emoves() (items []*Item)
- func (this *Item) Equal(that *Item) bool
- func (this *Item) ExpectedSymbol() (node ast.LexTNode)
- func (this *Item) HashKey() string
- func (this *Item) Move(rng CharRange) []*Item
- func (this *Item) MoveDot() []*Item
- func (this *Item) MoveRegDefId(id string) []*Item
- func (this *Item) Reduce() bool
- func (this *Item) String() string
type ItemList
- func ItemsSet0(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList
- func NewItemList(len int) ItemList
- func (this ItemList) AddExclusive(item *Item) (ItemList, error)
- func (this ItemList) AddNoDuplicate(items ...*Item) ItemList
- func (this ItemList) Closure(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList
- func (this ItemList) Contain(that *Item) bool
- func (this ItemList) ContainShift(id string) bool
- func (this ItemList) Equal(that ItemList) bool
- func (this ItemList) PrefixString(prefix string) string
- func (this ItemList) Remove(item *Item) ItemList
type ItemSet
- func NewItemSet(setNo int, lexPart *ast.LexPart, symbols *symbols.Symbols, items ItemList) *ItemSet
- func (this *ItemSet) Action() Action
- func (this *ItemSet) Add(items ...*Item)
- func (this *ItemSet) Contain(that *Item) bool
- func (this *ItemSet) Empty() bool
- func (this *ItemSet) Equal(items ItemList) bool
- func (this *ItemSet) Next(rng CharRange) ItemList
- func (this *ItemSet) NextDot() ItemList
- func (this *ItemSet) NextImport(imprt string) ItemList
- func (this *ItemSet) Size() int
- func (this *ItemSet) String() string
- func (this *ItemSet) StringItems() string
type ItemSets
- func GetItemSets(lexPart *ast.LexPart) *ItemSets
- func (this *ItemSets) Add(items ItemList) (setNo int)
- func (this *ItemSets) Closure() *ItemSets
- func (this *ItemSets) Contain(items ItemList) (yes bool, index int)
- func (this *ItemSets) List() []*ItemSet
- func (this *ItemSets) Size() int
- func (this *ItemSets) String() string
- func (this *ItemSets) Symbols() *symbols.Symbols

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func WriteStringNode ¶

func WriteStringNode(w io.Writer, node ast.LexNode, pos *itemPos)

Types ¶

type Accept ¶

type Accept string

func (Accept) String ¶

func (this Accept) String() string

type Action ¶

type Action interface {
	String() string
	// contains filtered or unexported methods
}

type CharRange ¶

type CharRange struct {
	From rune
	To   rune
}

func (CharRange) Equal ¶

func (this CharRange) Equal(that CharRange) bool

func (CharRange) String ¶

func (this CharRange) String() string

type DisjunctRangeSet ¶

type DisjunctRangeSet struct {
	MatchAny bool
	// contains filtered or unexported fields
}

set is kept sorted

func NewDisjunctRangeSet ¶

func NewDisjunctRangeSet() *DisjunctRangeSet

func (*DisjunctRangeSet) AddLexTNode ¶

func (this *DisjunctRangeSet) AddLexTNode(sym ast.LexTNode)

func (*DisjunctRangeSet) AddRange ¶

func (this *DisjunctRangeSet) AddRange(from, to rune)

|===========|

|-------| |===========| 1 |-----------+---|-------+ 2 |-----------+-----------| 3 |-----------+-----------+---| 4

|---|-------+ 5
|===========| 6
|-----------+---| 7
+===========+	|----| 8
+---|----|--+ 9
+---|-------| 10
+---|-------+---| 11

func (*DisjunctRangeSet) List ¶

func (this *DisjunctRangeSet) List() []CharRange

func (*DisjunctRangeSet) Range ¶

func (this *DisjunctRangeSet) Range(i int) CharRange

func (*DisjunctRangeSet) Size ¶

func (this *DisjunctRangeSet) Size() int

func (*DisjunctRangeSet) String ¶

func (this *DisjunctRangeSet) String() string

type Error ¶

type Error int

func (Error) String ¶

func (this Error) String() string

type Ignore ¶

type Ignore string

func (Ignore) String ¶

func (this Ignore) String() string

type Item ¶

type Item struct {
	Id   string
	Prod ast.LexProduction

	ProdIndex ast.LexProdIndex

	*symbols.Symbols
	// contains filtered or unexported fields
}

func NewItem ¶

func NewItem(prodId string, lexPart *ast.LexPart, symbols *symbols.Symbols) *Item

NewItem will return

T : s

for a lex production,

T : •s

without executing the ℇ-moves for s.

Func *Item.Emoves must be called to return the set of basic items for T : •s.

func (*Item) Clone ¶

func (this *Item) Clone() *Item

func (*Item) Emoves ¶

func (this *Item) Emoves() (items []*Item)

"this" may be a non-basic item. The function returns the set of basic items after ℇ-moves have be executed on this.

For a general description of dotted items (items) and ℇ-moves of items, see:

Modern Compiler Design. Dick Grune, et al. Second Edition. Springer 2012.

ℇ-moves for lex items:

Lex T be any production.
Let w,x,y,z be any strings of lexical symbols.
Let s = x|...|y have one or more alternatives.

Then

T : •s              =>  T : •x|...|z
                        ...
                        T :  x|...|•z

T : x•[y]z          =>  T : x[•y]z
                        T : x[y]•z

T : x[y•]z          =>  T : x[y]•z

T : x•{y}z          =>  T : x{•y}z
                        T : x{y}•z

T : x{y•}z          =>  T : x{•y}z
                        T : x{y}•z

T : w•(x|...|y)z    =>  T : w(•x|...|y)z
                        ...
                        T : w( x|...|•y)z

T : x(...|y•|...)z  =>  T : x(...|y|...)•z

func (*Item) Equal ¶

func (this *Item) Equal(that *Item) bool

func (*Item) ExpectedSymbol ¶

func (this *Item) ExpectedSymbol() (node ast.LexTNode)

ExpectedSymbol returns the expected symbol for shift items and nil for reduce items. This is the position of a basic item -- no ℇ-moves are possible.

func (*Item) HashKey ¶

func (this *Item) HashKey() string

func (*Item) Move ¶

func (this *Item) Move(rng CharRange) []*Item

func (*Item) MoveDot ¶

func (this *Item) MoveDot() []*Item

func (*Item) MoveRegDefId ¶

func (this *Item) MoveRegDefId(id string) []*Item

func (*Item) Reduce ¶

func (this *Item) Reduce() bool

Reduce returns true for return items, such as `T : xyz •`, otherwise returns false.

func (*Item) String ¶

func (this *Item) String() string

type ItemList ¶

type ItemList []*Item

Each Itemset element is a ItemList.

func ItemsSet0 ¶

func ItemsSet0(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList

func NewItemList ¶

func NewItemList(len int) ItemList

func (ItemList) AddExclusive ¶

func (this ItemList) AddExclusive(item *Item) (ItemList, error)

func (ItemList) AddNoDuplicate ¶

func (this ItemList) AddNoDuplicate(items ...*Item) ItemList

AddNoDuplicate will add any of the listed items that are not already in the list.

func (ItemList) Closure ¶

func (this ItemList) Closure(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList

See Algorithm: set.Closure() in package doc.

func (ItemList) Contain ¶

func (this ItemList) Contain(that *Item) bool

func (ItemList) ContainShift ¶

func (this ItemList) ContainShift(id string) bool

func (ItemList) Equal ¶

func (this ItemList) Equal(that ItemList) bool

func (ItemList) PrefixString ¶

func (this ItemList) PrefixString(prefix string) string

func (ItemList) Remove ¶

func (this ItemList) Remove(item *Item) ItemList

type ItemSet ¶

type ItemSet struct {
	Items ItemList

	*symbols.Symbols
	SymbolClasses     *DisjunctRangeSet
	Transitions       []int
	ImportTransitions []int
	DotTransition     int
	// contains filtered or unexported fields
}

Set of basic lexical items Each entry of Transitions is the next set for the corresponding symbo class. len(ItemSet.Transitions) == ItemSet.SymbolClasses.Size()

func NewItemSet ¶

func NewItemSet(setNo int, lexPart *ast.LexPart, symbols *symbols.Symbols, items ItemList) *ItemSet

func (*ItemSet) Action ¶

func (this *ItemSet) Action() Action

func (*ItemSet) Add ¶

func (this *ItemSet) Add(items ...*Item)

Add will return the number of deduplicated items added to the ItemSet.

func (*ItemSet) Contain ¶

func (this *ItemSet) Contain(that *Item) bool

func (*ItemSet) Empty ¶

func (this *ItemSet) Empty() bool

func (*ItemSet) Equal ¶

func (this *ItemSet) Equal(items ItemList) bool

func (*ItemSet) Next ¶

func (this *ItemSet) Next(rng CharRange) ItemList

See algorithm: set.Next() in package doc.

func (*ItemSet) NextDot ¶

func (this *ItemSet) NextDot() ItemList

func (*ItemSet) NextImport ¶

func (this *ItemSet) NextImport(imprt string) ItemList

func (*ItemSet) Size ¶

func (this *ItemSet) Size() int

func (*ItemSet) String ¶

func (this *ItemSet) String() string

func (*ItemSet) StringItems ¶

func (this *ItemSet) StringItems() string

type ItemSets ¶

type ItemSets struct {
	// contains filtered or unexported fields
}

func GetItemSets ¶

func GetItemSets(lexPart *ast.LexPart) *ItemSets

func (*ItemSets) Add ¶

func (this *ItemSets) Add(items ItemList) (setNo int)

func (*ItemSets) Closure ¶

func (this *ItemSets) Closure() *ItemSets

func (*ItemSets) Contain ¶

func (this *ItemSets) Contain(items ItemList) (yes bool, index int)

func (*ItemSets) List ¶

func (this *ItemSets) List() []*ItemSet

func (*ItemSets) Size ¶

func (this *ItemSets) Size() int

func (*ItemSets) String ¶

func (this *ItemSets) String() string

func (*ItemSets) Symbols ¶

func (this *ItemSets) Symbols() *symbols.Symbols

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL