items

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2024 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package items implements dotted items for FSA generation during the lexer generation process.

GENERALISED SUBSET ALGORITHM

The problem: Lexers for some applications, e.g.: email header parsing have a number of char range input symbols, which overlap to a large extent.

The lexer symbol space can be very large, e.g.: UTF-8, so it is impractical to create items for every possible value of the char ranges.

Possible solutions:

1. Automatic conflict handling Impractical because of overlapping char ranges for each state.

2. Separate symbol classification functions for each state Better, but overlaps can still occur

3. Create a set of intersections of the valid char ranges of each set. Use the intersections as input symbols to the lexer. This allows the full state space to be generated.

DEFINITION: LexPart = (Tokens, RegDefs, TermSym)

ALGORITHM: Generate lexer sets

S0 := CreateSet0()
itemSets = {S0}
transitions = {}
repeat {
	for set in itemSets {
		for symRange in set.SymRanges() {
			if nextSet := set.Next(symRange); nextSet not in itemSets {
				add nextSet to itemSets
				add (set, symRange, nextSet) to transitions
			}
		}
	}
} until no more sets added to itemSets
return itemSets, transitions

ALGORITHM: CreateSet0()

S0 = {}
for all T : x in Tokens {
	add T : •x to S0
}
return S0.Closure()

DEFINITION: Let R be the id of a regular definition.

ALGORITHM: set.Next()

Output: the set containing all items in set that can move over symRange.

repeat
	nextSet = {}
	for I : x •c y in set and c is in symRange{
		add I : x c •y to nextSet
	}
	for R : •z in nextSet and I : x •R y in set {
		add I : x •R y to nextSet
	}
	for R : z• in nextSet and I : x •R y in set {
		add I : x R• y to nextSet
	}
until no items were added to nextSet

return nextSet.Closure()

ALGORITHM: set.Closure()

closure = {X : x •y | X : x •y in set and y not null}
repeat
	for item I : x •R y in set and R : •z in RegDefs {
		add R : •z to closure
	}
until no more items were added to closure

Disjunct character ranges

DEFINITIONS

 	The alphabet, S = [s[1],s[n]], of a regular grammar is an ordered set of n symbols. Given any s[i] != s[n] of S the next symbol in S
 	can be found  	by s[i] + 1. s[n] + 1 is undefined. For any s[i] != s[1] the previous symbol can be found by s[i] - 1. |S| is the
 	size of S.

 	Closed intervals of the alphabet, S, may be declared as symbol ranges, r = [r[1],r[k]] where the size of r, |r| = k. r is an ordered
 	subset of S with n elements without gaps, i.e.: r[i] + 1 = r[i+1].

 	X : x •a y is an item  of a regular grammar, with a the expected symbol and x, y possibly empty strings of symbols from S.

	In the algorithm symbols may be elements or ranges of S. If s is an element of S, |s| = 1. If |s| = 1, s = [s,s]

ALGORITHM: set.SymRanges() - Create the lexer symbol ranges of an item set, S:

 	Input: An item set of the grammar.

 	Output: A set of non-overlapping symbol ranges.

	Let SR = {} be a initially empty set of symbol ranges.

	for all items i in S {
		let s be the expected symbol of i
		if {
		// s outside all r in SR
		case for all r in I, s[|s|] < r[1] or s[1] > r[|r|]:
			add [s[1],s[|s|] to I

		// s equal to some r in SR
		case r in SR and s[1] = r[1] and s[|s|] = r[|r|]:
			do nothing

		//s inside some r in SR
		case r in SR and r[1] = s[1] and s[|s|] < r[|r|]:
			replace r in SR by {[r[1],s[|s|]], [s[|s|]+1,r[|r|]]}
		case r in SR and r[1] < s[1] and s[|s|] < r[|r|]:
			replace r in SR by {[r[1],s[1]-1], [s[s[1],s[|s|]], s[|s]+1,r[|r|]]}
		case r in SR and r[1] < s[1] and s[|s|] = r[|r|]:
			repace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]]}

		// overlap to left
		case r in SR and s[1] < r[1] <= s[|s|] < r[|r|]:
			replace r in SR by {[s[1],r[1]-1], [r[1],s[|s|]], [s[|s|]+1,r[|r|]]}
		case r in SR and s[1] < r[1] and s[|s|] = r[|r|]:
			add [s[1],r[1]-1] to SR
		}

		// overlap to right
		case r in SR and s[1] = r[1] and s[|s|] > r[|r|]:
			add [r[|r|]+1,s[|s|]] to SR
		case r in SR and [s1] > r[1] and s[|s|] > r[|r|]:
			replace r in SR by {[r[1],s[1]-1], [s[1],r[|r|]], [r[|r|]+1,s[|s|]]}
	}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WriteStringNode

func WriteStringNode(w io.Writer, node ast.LexNode, pos *itemPos)

Types

type Accept

type Accept string

func (Accept) String

func (this Accept) String() string

type Action

type Action interface {
	String() string
	// contains filtered or unexported methods
}

type CharRange

type CharRange struct {
	From rune
	To   rune
}

func (CharRange) Equal

func (this CharRange) Equal(that CharRange) bool

func (CharRange) String

func (this CharRange) String() string

type DisjunctRangeSet

type DisjunctRangeSet struct {
	MatchAny bool
	// contains filtered or unexported fields
}

set is kept sorted

func NewDisjunctRangeSet

func NewDisjunctRangeSet() *DisjunctRangeSet

func (*DisjunctRangeSet) AddLexTNode

func (this *DisjunctRangeSet) AddLexTNode(sym ast.LexTNode)

func (*DisjunctRangeSet) AddRange

func (this *DisjunctRangeSet) AddRange(from, to rune)
|===========|

|-------| |===========| 1 |-----------+---|-------+ 2 |-----------+-----------| 3 |-----------+-----------+---| 4

|---|-------+ 5
|===========| 6
|-----------+---| 7
+===========+	|----| 8
+---|----|--+ 9
+---|-------| 10
+---|-------+---| 11

func (*DisjunctRangeSet) List

func (this *DisjunctRangeSet) List() []CharRange

func (*DisjunctRangeSet) Range

func (this *DisjunctRangeSet) Range(i int) CharRange

func (*DisjunctRangeSet) Size

func (this *DisjunctRangeSet) Size() int

func (*DisjunctRangeSet) String

func (this *DisjunctRangeSet) String() string

type Error

type Error int

func (Error) String

func (this Error) String() string

type Ignore

type Ignore string

func (Ignore) String

func (this Ignore) String() string

type Item

type Item struct {
	Id   string
	Prod ast.LexProduction

	ProdIndex ast.LexProdIndex

	*symbols.Symbols
	// contains filtered or unexported fields
}

func NewItem

func NewItem(prodId string, lexPart *ast.LexPart, symbols *symbols.Symbols) *Item

NewItem will return

T : s

for a lex production,

T : •s

without executing the ℇ-moves for s.

Func *Item.Emoves must be called to return the set of basic items for T : •s.

func (*Item) Clone

func (this *Item) Clone() *Item

func (*Item) Emoves

func (this *Item) Emoves() (items []*Item)

"this" may be a non-basic item. The function returns the set of basic items after ℇ-moves have be executed on this.

For a general description of dotted items (items) and ℇ-moves of items, see:

Modern Compiler Design. Dick Grune, et al. Second Edition. Springer 2012.

ℇ-moves for lex items:

Lex T be any production.
Let w,x,y,z be any strings of lexical symbols.
Let s = x|...|y have one or more alternatives.

Then

T : •s              =>  T : •x|...|z
                        ...
                        T :  x|...|•z

T : x•[y]z          =>  T : x[•y]z
                        T : x[y]•z

T : x[y•]z          =>  T : x[y]•z

T : x•{y}z          =>  T : x{•y}z
                        T : x{y}•z

T : x{y•}z          =>  T : x{•y}z
                        T : x{y}•z

T : w•(x|...|y)z    =>  T : w(•x|...|y)z
                        ...
                        T : w( x|...|•y)z

T : x(...|y•|...)z  =>  T : x(...|y|...)•z

func (*Item) Equal

func (this *Item) Equal(that *Item) bool

func (*Item) ExpectedSymbol

func (this *Item) ExpectedSymbol() (node ast.LexTNode)

ExpectedSymbol returns the expected symbol for shift items and nil for reduce items. This is the position of a basic item -- no ℇ-moves are possible.

func (*Item) HashKey

func (this *Item) HashKey() string

func (*Item) Move

func (this *Item) Move(rng CharRange) []*Item

func (*Item) MoveDot

func (this *Item) MoveDot() []*Item

func (*Item) MoveRegDefId

func (this *Item) MoveRegDefId(id string) []*Item

func (*Item) Reduce

func (this *Item) Reduce() bool

Reduce returns true for return items, such as `T : xyz •`, otherwise returns false.

func (*Item) String

func (this *Item) String() string

type ItemList

type ItemList []*Item

Each Itemset element is a ItemList.

func ItemsSet0

func ItemsSet0(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList

func NewItemList

func NewItemList(len int) ItemList

func (ItemList) AddExclusive

func (this ItemList) AddExclusive(item *Item) (ItemList, error)

func (ItemList) AddNoDuplicate

func (this ItemList) AddNoDuplicate(items ...*Item) ItemList

AddNoDuplicate will add any of the listed items that are not already in the list.

func (ItemList) Closure

func (this ItemList) Closure(lexPart *ast.LexPart, symbols *symbols.Symbols) ItemList

See Algorithm: set.Closure() in package doc.

func (ItemList) Contain

func (this ItemList) Contain(that *Item) bool

func (ItemList) ContainShift

func (this ItemList) ContainShift(id string) bool

func (ItemList) Equal

func (this ItemList) Equal(that ItemList) bool

func (ItemList) PrefixString

func (this ItemList) PrefixString(prefix string) string

func (ItemList) Remove

func (this ItemList) Remove(item *Item) ItemList

type ItemSet

type ItemSet struct {
	Items ItemList

	*symbols.Symbols
	SymbolClasses     *DisjunctRangeSet
	Transitions       []int
	ImportTransitions []int
	DotTransition     int
	// contains filtered or unexported fields
}

Set of basic lexical items Each entry of Transitions is the next set for the corresponding symbo class. len(ItemSet.Transitions) == ItemSet.SymbolClasses.Size()

func NewItemSet

func NewItemSet(setNo int, lexPart *ast.LexPart, symbols *symbols.Symbols, items ItemList) *ItemSet

func (*ItemSet) Action

func (this *ItemSet) Action() Action

func (*ItemSet) Add

func (this *ItemSet) Add(items ...*Item)

Add will return the number of deduplicated items added to the ItemSet.

func (*ItemSet) Contain

func (this *ItemSet) Contain(that *Item) bool

func (*ItemSet) Empty

func (this *ItemSet) Empty() bool

func (*ItemSet) Equal

func (this *ItemSet) Equal(items ItemList) bool

func (*ItemSet) Next

func (this *ItemSet) Next(rng CharRange) ItemList

See algorithm: set.Next() in package doc.

func (*ItemSet) NextDot

func (this *ItemSet) NextDot() ItemList

func (*ItemSet) NextImport

func (this *ItemSet) NextImport(imprt string) ItemList

func (*ItemSet) Size

func (this *ItemSet) Size() int

func (*ItemSet) String

func (this *ItemSet) String() string

func (*ItemSet) StringItems

func (this *ItemSet) StringItems() string

type ItemSets

type ItemSets struct {
	// contains filtered or unexported fields
}

func GetItemSets

func GetItemSets(lexPart *ast.LexPart) *ItemSets

func (*ItemSets) Add

func (this *ItemSets) Add(items ItemList) (setNo int)

func (*ItemSets) Closure

func (this *ItemSets) Closure() *ItemSets

func (*ItemSets) Contain

func (this *ItemSets) Contain(items ItemList) (yes bool, index int)

func (*ItemSets) List

func (this *ItemSets) List() []*ItemSet

func (*ItemSets) Size

func (this *ItemSets) Size() int

func (*ItemSets) String

func (this *ItemSets) String() string

func (*ItemSets) Symbols

func (this *ItemSets) Symbols() *symbols.Symbols

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL