gopapageno

package module
v0.0.0-...-fa1654f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2024 License: GPL-2.0 Imports: 11 Imported by: 0

README

Go PAPAGENO

Go PAPAGENO (PArallel PArser GENeratOr) is a parallel parser generator based on Floyd's Operator Precedence Grammars.

It generates parallel Go parsers starting from a lexer and a grammar specification. These specification files resemble Flex and Bison ones, although with some differences.

GoPAPAGENO is able to either generate type stubs to be integrated in a Go project, or completely self-contained programs that can be used without further effort.

This work is based on Papageno, a C parallel parser generator.

Authors and Contributors

Documentation

Index

Constants

View Source
const DefaultAverageTokenLength int = 4
View Source
const DefaultParallelFactor float64 = 0.5

Variables

View Source
var (
	ErrInvalid = errors.New("invalid character")
)

Functions

This section is empty.

Types

type AOPParser

type AOPParser struct {
	*OPParser
}

func NewAOPParser

func NewAOPParser(g *Grammar, src []byte, opts *RunOptions) *AOPParser

type COPPStack

type COPPStack struct {
	StatesLOS       *LOS[CyclicAutomataState]
	StateTokenStack *stack[*Token]

	State CyclicAutomataState

	ProducedTokens map[*Token]*Token
	// contains filtered or unexported fields
}

func NewCOPPStack

func NewCOPPStack(tokenStackPool *Pool[stack[*Token]], stateStackPool *Pool[stack[CyclicAutomataState]], producedTokens map[*Token]*Token) *COPPStack

NewCOPPStack creates a new COPPStack initialized with one empty stack.

func (*COPPStack) AppendStateToken

func (s *COPPStack) AppendStateToken(token *Token)

func (*COPPStack) Combine

func (s *COPPStack) Combine() *COPPStack

func (*COPPStack) CombineLOS

func (s *COPPStack) CombineLOS(pool *Pool[stack[Token]]) (*LOS[Token], map[*Token]*Token)

func (*COPPStack) Current

func (s *COPPStack) Current() []*Token

func (COPPStack) FirstTerminal

func (s COPPStack) FirstTerminal() *Token

FirstTerminal returns a pointer to the first terminal token on the stack.

func (*COPPStack) IsCurrentSingleNonterminal

func (s *COPPStack) IsCurrentSingleNonterminal() bool

func (*COPPStack) Iterator

func (s *COPPStack) Iterator() *CyclicParserStackIterator

func (*COPPStack) LastNonterminal

func (s *COPPStack) LastNonterminal() (*Token, error)

func (*COPPStack) Pop

func (s *COPPStack) Pop() *Token

func (*COPPStack) Pop2

func (s *COPPStack) Pop2() (*Token, *CyclicAutomataState)

func (*COPPStack) Previous

func (s *COPPStack) Previous() []*Token

func (*COPPStack) Push

func (s *COPPStack) Push(token *Token) *Token

func (*COPPStack) PushWithState

func (s *COPPStack) PushWithState(token *Token, state CyclicAutomataState) *Token

func (*COPPStack) SwapState

func (s *COPPStack) SwapState()

func (COPPStack) UpdateFirstTerminal

func (s COPPStack) UpdateFirstTerminal()

UpdateFirstTerminal should be used after a reduction in order to update the first terminal counter. In fact, in order to save some time, only the Push operation automatically updates the first terminal pointer, while the Pop operation does not.

func (*COPPStack) YieldingPrecedence

func (s *COPPStack) YieldingPrecedence() int

type COPParser

type COPParser struct {
	// contains filtered or unexported fields
}

COPParser implements parsing using a simplified approach to C-OPGs.

func NewCOPParser

func NewCOPParser(g *Grammar, src []byte, opts *RunOptions) *COPParser

NewCOPParser allocates all required resources for a COPParser to be usable.

func (*COPParser) CombineSweepLOS

func (p *COPParser) CombineSweepLOS(pool *Pool[stack[Token]], stacks []*COPPStack) (*LOS[Token], map[*Token]*Token)

func (*COPParser) Parse

func (p *COPParser) Parse(ctx context.Context, tokensLists []*LOS[Token]) (*Token, error)

Parse performs C-OPG parsing of the provided tokensLists, returning the root of the resulting parse tree.

type Constructor

type Constructor[T any] func() *T

type CyclicAutomataState

type CyclicAutomataState struct {
	CurrentIndex int
	CurrentLen   int

	PreviousIndex int
	PreviousLen   int
}

type CyclicParserStackIterator

type CyclicParserStackIterator struct {
	TokensIt *LOPSIt[Token]
	StatesIt *LOSIt[CyclicAutomataState]
}

func (*CyclicParserStackIterator) Cur

func (*CyclicParserStackIterator) IsLast

func (i *CyclicParserStackIterator) IsLast() bool

func (*CyclicParserStackIterator) Next

type Grammar

type Grammar struct {
	NumTerminals    uint16
	NumNonterminals uint16

	MaxRHSLength    int
	Rules           []Rule
	CompressedRules []uint16

	MaxPrefixLength    int
	Prefixes           [][]TokenType
	CompressedPrefixes []uint16

	PrecedenceMatrix          [][]Precedence
	BitPackedPrecedenceMatrix []uint64

	Func         ParserFunc
	PreambleFunc PreambleFunc

	ParsingStrategy ParsingStrategy
}

func (*Grammar) Parser

func (g *Grammar) Parser(src []byte, opts *RunOptions) Parser

type LOPS

type LOPS[T any] struct {
	// contains filtered or unexported fields
}

LOPS is a list of pointer stacks.

func NewLOPS

func NewLOPS[T any](pool *Pool[stack[*T]]) *LOPS[T]

NewLOPS creates a new LOPS initialized with an empty stack.

func (*LOPS[T]) Capacity

func (l *LOPS[T]) Capacity() int

Capacity returns the maximum capacity of the current allocated structure.

func (*LOPS[T]) Clear

func (l *LOPS[T]) Clear()

Clear empties the LOPS.

func (*LOPS[T]) Get

func (l *LOPS[T]) Get() *T

Get returns the topmost element from the LOPS.

func (*LOPS[T]) HeadIterator

func (l *LOPS[T]) HeadIterator() *LOPSIt[T]

HeadIterator returns an iterator initialized to point before the first element of the list.

func (*LOPS[T]) Length

func (l *LOPS[T]) Length() int

Length returns the number of items contained in the LOPS. It takes constant time to execute.

func (*LOPS[T]) MaxLength

func (l *LOPS[T]) MaxLength() int

MaxLength returns the maximum occupancy of the data structure so far, i.e. what is the maximum amount of items in use at any given time.

func (*LOPS[T]) NumStacks

func (l *LOPS[T]) NumStacks() int

NumStacks returns the number of stacks contained in the LOPS. It takes linear time (in the number of stacks) to execute.

func (*LOPS[T]) Pop

func (l *LOPS[T]) Pop() *T

Pop removes the topmost element from the LOPS and returns it.

func (*LOPS[T]) Push

func (l *LOPS[T]) Push(t *T) *T

Push adds an element to the LOPS. By default, the element is added to the current stack; if that is full, a new one is obtained from the pool.

type LOPSIt

type LOPSIt[T any] struct {
	// contains filtered or unexported fields
}

LOPSIt allows to iterate over a LOPS, either forward or backwards.

func (*LOPSIt[T]) Cur

func (i *LOPSIt[T]) Cur() *T

Cur returns a pointer to the current element. It returns nil if it points before the first element or after the last element of the list.

func (*LOPSIt[T]) IsLast

func (i *LOPSIt[T]) IsLast() bool

func (*LOPSIt[T]) Next

func (i *LOPSIt[T]) Next() *T

Next moves the iterator one position forward and returns a pointer to the new current element. It returns nil if it points after the last element of the list.

func (*LOPSIt[T]) Prev

func (i *LOPSIt[T]) Prev() *T

Prev moves the iterator one position backward and returns a pointer to the new current element. It returns nil if it points before the first element of the list.

type LOS

type LOS[T any] struct {
	// contains filtered or unexported fields
}

LOS is a list of stacks.

func NewLOS

func NewLOS[T any](pool *Pool[stack[T]]) *LOS[T]

NewLOS creates a new LOS initialized with an empty stack.

func (*LOS[T]) Clear

func (l *LOS[T]) Clear()

Clear empties the LOS.

func (*LOS[T]) Get

func (l *LOS[T]) Get() *T

Get returns the topmost element from the LOS.

func (*LOS[T]) GetNext

func (l *LOS[T]) GetNext() *T

GetNext returns the first empty element from the LOS.

func (*LOS[T]) HeadIterator

func (l *LOS[T]) HeadIterator() *LOSIt[T]

HeadIterator returns an iterator initialized to point before the first element of the list.

func (*LOS[T]) Length

func (l *LOS[T]) Length() int

Length returns the number of items contained in the LOS. It takes constant time to execute.

func (*LOS[T]) Merge

func (l *LOS[T]) Merge(other *LOS[T])

Merge links the stacks of the current and of another LOS.

func (*LOS[T]) NumStacks

func (l *LOS[T]) NumStacks() int

NumStacks returns the number of stacks contained in the LOS. It takes linear time (in the number of stacks) to execute.

func (*LOS[T]) Pop

func (l *LOS[T]) Pop() *T

Pop removes the topmost element from the LOS and returns it.

func (*LOS[T]) Push

func (l *LOS[T]) Push(t T) *T

Push adds an element to the LOS. By default, the element is added to the current stack; if that is full, a new one is obtained from the pool.

func (*LOS[T]) Split

func (l *LOS[T]) Split(n int) ([]*LOS[T], error)

Split splits a LOS into a slice of LOS of length n. The original LOS should not be used after this operation.

type LOSIt

type LOSIt[T any] struct {
	// contains filtered or unexported fields
}

LOSIt allows to iterate over a LOS, either forward or backwards.

func (*LOSIt[T]) Cur

func (i *LOSIt[T]) Cur() *T

Cur returns a pointer to the current element. It returns nil if it points before the first element or after the last element of the list.

func (*LOSIt[T]) IsLast

func (i *LOSIt[T]) IsLast() bool

func (*LOSIt[T]) Next

func (i *LOSIt[T]) Next() *T

Next moves the iterator one position forward and returns a pointer to the new current element. It returns nil if it points after the last element of the list.

type LexResult

type LexResult uint8
const (
	LexOK LexResult = iota
	LexSkip
	LexErr
	LexEOF
)

type Lexer

type Lexer struct {
	Automaton          LexerDFA
	CutPointsAutomaton LexerDFA
	Func               LexerFunc

	PreambleFunc PreambleFunc
}

func (*Lexer) Scanner

func (l *Lexer) Scanner(src []byte, opts *RunOptions) *Scanner

type LexerDFA

type LexerDFA []LexerDFAState

type LexerDFAState

type LexerDFAState struct {
	Transitions     [256]int
	IsFinal         bool
	AssociatedRules []int
}

type LexerFunc

type LexerFunc func(rule int, text string, start int, end int, thread int, token *Token) LexResult

type OPPStack

type OPPStack struct {
	// contains filtered or unexported fields
}

OPPStack is the data structure used by OPP and AOPP workers during parsing.

func NewOPPStack

func NewOPPStack(pool *Pool[stack[*Token]]) *OPPStack

func (*OPPStack) Combine

func (s *OPPStack) Combine() *OPPStack

func (*OPPStack) CombineLOS

func (s *OPPStack) CombineLOS(pool *Pool[stack[Token]]) *LOS[Token]

func (OPPStack) FirstTerminal

func (s OPPStack) FirstTerminal() *Token

FirstTerminal returns a pointer to the first terminal token on the stack.

func (*OPPStack) LastNonterminal

func (s *OPPStack) LastNonterminal() (*Token, error)

func (*OPPStack) Pop

func (s *OPPStack) Pop() *Token

Pop removes the topmost element from the parserStack and returns it.

func (*OPPStack) Push

func (s *OPPStack) Push(token *Token) *Token

Push pushes a token pointer in the parserStack. It returns the pointer itself.

func (OPPStack) UpdateFirstTerminal

func (s OPPStack) UpdateFirstTerminal()

UpdateFirstTerminal should be used after a reduction in order to update the first terminal counter. In fact, in order to save some time, only the Push operation automatically updates the first terminal pointer, while the Pop operation does not.

func (*OPPStack) YieldingPrecedence

func (s *OPPStack) YieldingPrecedence() int

type OPParser

type OPParser struct {
	// contains filtered or unexported fields
}

func NewOPParser

func NewOPParser(g *Grammar, src []byte, opts *RunOptions) *OPParser

func (*OPParser) CombineSweepLOS

func (p *OPParser) CombineSweepLOS(pool *Pool[stack[Token]], stacks []*OPPStack) *LOS[Token]

func (*OPParser) Parse

func (p *OPParser) Parse(ctx context.Context, tokensLists []*LOS[Token]) (*Token, error)

type Parser

type Parser interface {
	Parse(ctx context.Context, tokensLists []*LOS[Token]) (*Token, error)
}

type ParserFunc

type ParserFunc func(rule uint16, ruleType RuleFlags, lhs *Token, rhs []*Token, thread int)

type ParsingStrategy

type ParsingStrategy uint8
const (
	// OPP is Operator-Precedence Parsing. It is the original parsing ReductionStrategy.
	OPP ParsingStrategy = iota
	// AOPP is Associative Operator Precedence Parsing.
	AOPP
	// COPP is Cyclic Operator Precedence Parsing.
	COPP
)

func (ParsingStrategy) String

func (s ParsingStrategy) String() string

type Pool

type Pool[T any] struct {
	// contains filtered or unexported fields
}

A Pool can be used to preallocate a number of items of type T. It is not thread-safe.

func NewPool

func NewPool[T any](length int, opts ...PoolOpt[T]) *Pool[T]

NewPool creates a new pool, allocating `length` elements.

func (*Pool[T]) Get

func (p *Pool[T]) Get() *T

Get returns an item from the pool if available. Otherwise, it initializes a new one. It is not thread-safe.

func (*Pool[T]) Left

func (p *Pool[T]) Left() int

Left returns the number of items remaining in the pool.

func (*Pool[T]) NumAllocated

func (p *Pool[T]) NumAllocated() int

NumAllocated returns the number of items allocated so far.

type PoolOpt

type PoolOpt[T any] func(p *Pool[T])

func WithConstructor

func WithConstructor[T any](constructor Constructor[T]) PoolOpt[T]

type PreambleFunc

type PreambleFunc func(sourceLen, concurrency int)

type Precedence

type Precedence uint8
const (
	PrecEquals Precedence = iota
	PrecYields
	PrecTakes
	PrecAssociative
	PrecEmpty
)

func (Precedence) String

func (p Precedence) String() string

type ReductionStrategy

type ReductionStrategy uint8

A ReductionStrategy defines which kind of algorithm should be executed when collecting and running multiple parsing passes.

const (
	// ReductionSweep will run a single serial pass after combining data from the first `n` parallel runs.
	ReductionSweep ReductionStrategy = iota
	// ReductionParallel will combine adjacent parsing results and recursively run `n-1` parallel runs until one stack remains.
	ReductionParallel
	// ReductionMixed will run a limited number of parallel passes, then combine the remaining inputs to perform a final serial pass.
	ReductionMixed
)

func (ReductionStrategy) String

func (s ReductionStrategy) String() string

type Rule

type Rule struct {
	Lhs  TokenType
	Rhs  []TokenType
	Type RuleFlags
}

type RuleFlags

type RuleFlags uint8
const (
	RuleSimple RuleFlags = 1 << iota
	RuleAppend
	RuleCombine
	RuleCyclic
	RulePrefix
)

func (RuleFlags) Clear

func (t RuleFlags) Clear(flag RuleFlags) RuleFlags

func (RuleFlags) Has

func (t RuleFlags) Has(flag RuleFlags) bool

func (RuleFlags) Set

func (t RuleFlags) Set(flag RuleFlags) RuleFlags

func (RuleFlags) String

func (t RuleFlags) String() string

func (RuleFlags) Toggle

func (t RuleFlags) Toggle(flag RuleFlags) RuleFlags

type RunOptions

type RunOptions struct {
	Concurrency        int
	InitialConcurrency int
	ReductionStrategy  ReductionStrategy

	AvgTokenLength int
	ParallelFactor float64
	// contains filtered or unexported fields
}

type Runner

type Runner struct {
	Lexer  *Lexer
	Parser *Grammar

	Options RunOptions
}

func NewRunner

func NewRunner(lexer *Lexer, parser *Grammar, opts ...RunnerOpt) *Runner

func (*Runner) Run

func (r *Runner) Run(ctx context.Context, src []byte) (*Token, error)

type RunnerOpt

type RunnerOpt func(p *Runner)

func WithAverageTokenLength

func WithAverageTokenLength(length int) RunnerOpt

func WithCPUProfiling

func WithCPUProfiling(w io.Writer) RunnerOpt

func WithConcurrency

func WithConcurrency(n int) RunnerOpt

func WithGarbageCollection

func WithGarbageCollection(on bool) RunnerOpt

func WithLogging

func WithLogging(logger *log.Logger) RunnerOpt

func WithMemoryProfiling

func WithMemoryProfiling(w io.Writer) RunnerOpt

func WithParallelFactor

func WithParallelFactor(factor float64) RunnerOpt

func WithReductionStrategy

func WithReductionStrategy(strat ReductionStrategy) RunnerOpt

type Scanner

type Scanner struct {
	Lexer *Lexer
	// contains filtered or unexported fields
}

Scanner implements reading and tokenization.

func (*Scanner) Lex

func (s *Scanner) Lex(ctx context.Context) ([]*LOS[Token], error)

type Token

type Token struct {
	Type       TokenType
	Precedence Precedence

	Value any

	Next      *Token
	Child     *Token
	LastChild *Token
}

func (*Token) Height

func (t *Token) Height() int

Height computes the height of the AST rooted in `t`. It can be used as an evaluation metric for tree-balance, as left/right-skewed trees will have a bigger height compared to balanced trees.

func (*Token) IsTerminal

func (t *Token) IsTerminal() bool

func (*Token) Size

func (t *Token) Size() int

Size returns the number of tokens in the AST rooted in `t`.

func (*Token) String

func (t *Token) String() string

String returns a string representation of the AST rooted in `t`. This should be used rarely, as it doesn't print out a proper string representation of the token type. Gopapageno will generate a `SprintToken` function for your tokens.

type TokenType

type TokenType uint16
const (
	TokenEmpty TokenType = 0
	TokenTerm  TokenType = 0x8000
)

func (TokenType) IsTerminal

func (t TokenType) IsTerminal() bool

func (TokenType) Value

func (t TokenType) Value() uint16

Directories

Path Synopsis
cmd
examples
adder/aopp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
adder/copp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
adder/opp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
expr/aopp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
expr/copp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
expr/opp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
json/aopp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
json/copp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
json/opp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
nested
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
xml/opp
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
xpath
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.
ext
regex/refactored
Code generated by Gopapageno; DO NOT EDIT.
Code generated by Gopapageno; DO NOT EDIT.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL