automaton

package
v0.0.0-...-d0be9ee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2015 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	REGEXP_UNION         = Kind(1)
	REGEXP_CONCATENATION = Kind(2)
	REGEXP_INTERSECTION  = Kind(3)
	REGEXP_OPTIONAL      = Kind(4)
	REGEXP_REPEAT        = Kind(5)
	REGEXP_REPEAT_MIN    = Kind(6)
	REGEXP_REPEAT_MINMAX = Kind(7)
	REGEXP_COMPLEMENT    = Kind(8)
	REGEXP_CHAR          = Kind(9)
	REGEXP_CHAR_RANGE    = Kind(10)
	REGEXP_ANYCHAR       = Kind(11)
	REGEXP_EMPTY         = Kind(12)
	REGEXP_STRING        = Kind(13)
	REGEXP_ANYSTRING     = Kind(14)
	REGEXP_AUTOMATON     = Kind(15)
	REGEXP_INTERVAL      = Kind(16)
)
View Source
const (
	INTERSECTION = 0x0001 // &
	COMPLEMENT   = 0x0002 // ~
	EMPTY        = 0x0004 // #
	ANYSTRING    = 0x0008 // @
	AUTOMATON    = 0x0010 // <identifier>
	INTERVAL     = 0x0020 // <n-m>
	ALL          = 0xffff // enables all optional regexp syntax.
	NONE         = 0x0000 // enables no optional regexp syntax.

)

Syntax flags

View Source
const HASHMAP_CUTOVER = 30
View Source
const MIN_CODE_POINT = 0

Go doesn't have unicode.MinRune which should be 0

View Source
const TREE_MAP_CUTOVER = 30

If we hold more than this many states, we swtich from O(N*2) linear ops to O(N log(N)) TreeMap

Variables

This section is empty.

Functions

This section is empty.

Types

type Automaton

type Automaton struct {
	// contains filtered or unexported fields
}

Represents an automaton and all its states and transitions. States are integers and must be created using {@link #createState}. Mark a state as an accept state using {@link #setAccept}. Add transitions using {@link #addTransition}. Each state must have all of its transitions added at once; if this is too restrictive then use {@link Automaton.Builder} instead. State 0 is always the initial state. Once a state is finished, either because you've starting adding transitions to another state or you call {@link #finishState}, then that states transitions are sorted (first by min, then max, then dest) and reduced (transitions with adjacent labels going to the same dest are combined).

func MakeEmpty

func MakeEmpty() *Automaton

Returns a new (deterministic) automaton with the empty language.

func (*Automaton) IsAccept

func (a *Automaton) IsAccept(state int) bool

Returns true if this state is an accept state.

func (*Automaton) String

func (a *Automaton) String() string

type AutomatonBuilder

type AutomatonBuilder struct {
	// contains filtered or unexported fields
}

type AutomatonProvider

type AutomatonProvider func(name string) *Automaton

Automaton provider for RegExp.

type CharacterRunAutomaton

type CharacterRunAutomaton struct {
	*RunAutomaton
}

Automaton representation for matching []char

func NewCharacterRunAutomaton

func NewCharacterRunAutomaton(a *Automaton) *CharacterRunAutomaton

type DaciukMihovAutomatonBuilder

type DaciukMihovAutomatonBuilder struct {
}

type FrozenIntSet

type FrozenIntSet struct {
	// contains filtered or unexported fields
}

func (*FrozenIntSet) String

func (fis *FrozenIntSet) String() string

type IntPair

type IntPair struct {
	// contains filtered or unexported fields
}

type Kind

type Kind int

type PointTransitionSet

type PointTransitionSet struct {
	// contains filtered or unexported fields
}

func (*PointTransitionSet) String

func (pts *PointTransitionSet) String() string

type PointTransitions

type PointTransitions struct {
	// contains filtered or unexported fields
}

Holds all transitions that start on this int point, or end at this point-1

type PointTransitionsArray

type PointTransitionsArray []*PointTransitions

func (PointTransitionsArray) Len

func (a PointTransitionsArray) Len() int

func (PointTransitionsArray) Less

func (a PointTransitionsArray) Less(i, j int) bool

func (PointTransitionsArray) Swap

func (a PointTransitionsArray) Swap(i, j int)

type RegExp

type RegExp struct {
	// contains filtered or unexported fields
}

Regular Expression extension to Automaton.

Regular expressions are built from the following abstract syntax:

	regexp	::= unionexp
 			|
	unionexp ::= interexp | unionexp 	(union)
	 		| interexp
	interexp ::= concatexp & interexp 	(intersection) 						[OPTIONAL]
 			| concatexp
	concatexp ::= repeatexp concatexp	 (concatenation)
 			| repeatexp
	repeatexp ::= repeatexp ? 			(zero or one occurrence)
 			| repeatexp * 				(zero or more occurrences)
 			| repeatexp + 				(one or more occurrences)
 			| repeatexp {n} 			(n occurrences)
 			| repeatexp {n,} 			(n or more occurrences)
 			| repeatexp {n,m} 			(n to m occurrences, including both)
 			| complexp
	complexp ::= ~ complexp 			(complement) 						[OPTIONAL]
 			| charclassexp
	charclassexp ::= [ charclasses ] 	(character class)
 			| [^ charclasses ] 			(negated character class)
 			| simpleexp
	charclasses ::= charclass charclasses
 			| charclass
	charclass ::= charexp - charexp 	(character range, including end-points)
 			| charexp
	simpleexp ::= charexp
 			| . 						(any single character)
 			| # 						(the empty language) 				[OPTIONAL]
 			| @ 						(any string) 						[OPTIONAL]
			| " <Unicode string without double-quotes>  " (a string)
 			| ( ) 						(the empty string)
 			| ( unionexp ) 				(precedence override)
 			| < <identifier> > 			(named automaton) 					[OPTIONAL]
			| <n-m> 					(numerical interval) 				[OPTIONAL]
	charexp ::= <Unicode character> 	(a single non-reserved character)
 			| \ <Unicode character>  	(a single character)

The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).

func NewRegExp

func NewRegExp(s string) *RegExp

Constructs new RegExp from a string. Same as RegExp(s, ALL)

func NewRegExpWithFlag

func NewRegExpWithFlag(s string, flags int) *RegExp

Constructs new RegExp from a string.

func (*RegExp) String

func (re *RegExp) String() string

Constructs string from parsed regular expression

func (*RegExp) ToAutomaton

func (re *RegExp) ToAutomaton() *Automaton

Constructs new Automaton from this RegExp. Same as ToAutomaton(nil) (empty automaton map).

type RunAutomaton

type RunAutomaton struct {
	// contains filtered or unexported fields
}

Finite-state automaton with fast run operation.

func (*RunAutomaton) String

func (ra *RunAutomaton) String() string

type SortedIntSet

type SortedIntSet struct {
	// contains filtered or unexported fields
}

Just holds a set of []int states, plus a corresponding []int count per state. Used by determinize().

I have to disable hashCode and use string key to mimic Lucene's custom hashing function here.

func (*SortedIntSet) String

func (sis *SortedIntSet) String() string

type StateList

type StateList struct {
	// contains filtered or unexported fields
}

type StateListNode

type StateListNode struct {
	// contains filtered or unexported fields
}

type StatePair

type StatePair struct {
	// contains filtered or unexported fields
}

Pair of states.

type Transition

type Transition struct {
	// contains filtered or unexported fields
}

Holds one transition from an {@link Automaton}. This is typically used temporarily when iterating through transitions by invoking {@link Automaton#initTransition} and {@link Automaton#getNextTransition}.

func (*Transition) String

func (t *Transition) String() string

type TransitionList

type TransitionList struct {
	// contains filtered or unexported fields
}

Simple custom []*Transition

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL