SlParser

package module
v0.1.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2024 License: GPL-3.0 Imports: 8 Imported by: 0

README

SLParser

An LARL parser generator (Work in Progress) -- DONE JUST FOR FUN; Not meant to be official or anything.

Table of Contents

  1. Table of Contents
  2. Introduction

Introduction

In order to use the SLParser, the desired grammar must be specified in two separate files with two distinct grammars and formats: one for the lexer and the other for the parser.

EbnfParser

A parser that parses a modified version of the EBNF grammar.

Table of Contents

  1. Table of Contents
  2. Introduction

Introduction

Language Specification

Lexical Elements

Here's a description of the various lexical elements that appears in the grammar.

Identifiers

An identifier is defined as a sequence of one or more letters of the English alphabet (i.e., from 'a' to 'z') that, optionally, can end with a sequence of one or more decimal digits (i.e., from '0' to '9').

This grammar distinguishes between two kinds of identifiers: uppercase and lowercase. A lowercase identifier is a type of identifies whose letters can only be lowercase letters while an uppercase identifier is a type of identifier whose letters can either be uppercase or lowercase letters. Finally, lowercase identifiers can use a single underscore character (_) to separate words.

For example, foo and f are valid lowercase identifiers while Foo, FooBar, and F are valid uppercase identifiers. On the contrary, fo o, 1bar, foo__bar, fooBar, and so on are not valid identifiers.

Symbols

A symbol is a special character that appears in the grammar.

Punctuation Name Description
. dot specifies the end of a rule.
Brackets Name Description
(, ) parentheses specifies the start and end of a sub OR rule.
Operators Name Description
= equal separates the lhs from the rhs.
| pipe exclusive or.
Whitespace Name Description
\r\n, \n newline separates multiple rules and/or lines.
\t tab indentation.
ws separates elements from each other.

Spaces and tabs are ignored in the grammar and so, stuff like a b and a b are equivalent.

Source

Overview

In this context, the term "source" refers to the file containing the EBNF grammar.

Syntax

Here's the syntax of the source file:

Source = Rule { "\n" Rule } EOF .

Where:

  • Rule refer to the rules of the grammar.
  • EOF is a special symbol that indicates the end of the file. Thus, outside of the rules, nothing else is allowed.

In essence, a source file is a sequence of one or more rules (each of which is separated by one or more newline characters (\n)) that are read until the end of the file.

Rule

Overview

A rule is the core of any grammar and it is used to describe how the grammar should be parsed.

Syntax

Here's the syntax of a rule:

Rule     = SlRule | MlRule .
SlRule   = uppercase_id "=" RhsCls "." .
MlRule   = uppercase_id "\n" LineRule "\n." .
LineRule = "=" RhsCls { "\n| "RhsCls } .
RhsCls   = Rhs { Rhs } .

Where:

  • uppercase_id refers to an uppercase identifier.
  • Rhs refers to the right-hand side of the rule.

In essence, a rule can either be a single-line rule or a multi-line rule. If it is a single-line rule, then the uppercase identifier is followed by an equal sign (=) and the right-hand side clause followed by the dot (.). On the other hand, if it is a multi-line rule, then the uppercase identifier is followed by a sequence of one or more right-hand side clauses preceded by a pipe (|). Each line is indented one level and the first one is the only one that stats with an equal sign (=) rather than a pipe. Finally, the dot (.) is written in a newline and indented one level as well.

Examples

Here are some examples of valid rules:

Color
   = red
   | green
   | blue
   .

This rule states that a color can either be "red", "green", or "blue".

Person = name age .

This rule states that a person has a name followed by an age.

Right-hand Side

Overview

A right-hand side is the unit of the grammar and it specifies the individual atoms/units that make up a rule.

Syntax

Here's the syntax of a right-hand side:

Rhs        = Identifier | OrGroup .
Identifier = uppercase_id | lowercase_id .
OrGroup    = "(" OrExpr ")" .
OrExpr     = Identifier "|" Identifier { "|" Identifier } .

In essence, a right-hand side can either be an identifier or an OR group. An identifier is any lowercase or uppercase word while, an OR group is an OR expression that is surrounded by parentheses (( and )). Finally, an OR expression is a sequence of two or more identifiers separated by a pipe (|).

Parsing

Full Grammar

equal = "=" .
dot = "." .
pipe = "|" .
newline = [ "\r" ] "\n" { [ "\r" ] "\n" } .
ws = " " | "\t" . -> skip
op_paren = "(" .
cl_paren = ")" .

uppercase_id = uppercase_word { uppercase_word } { digit } .
lowercase_id = lowercase_word { digit } .

fragment lowercase_word = "a".."z" { "a".."z" } . 
fragment uppercase_word = "A".."Z" { "a".."z" } .
fragment digit = "0".."9" .

Source = Source1 EOF .
Source1 = Rule .
Source1 = Rule newline Source1 .

Rule = uppercase_id equal RhsCls dot .
Rule = uppercase_id newline equal RhsCls RuleLine .
RuleLine = newline pipe RhsCls RuleLine .
RuleLine = newline dot  .

RhsCls = Rhs .
RhsCls = Rhs RhsCls .

Rhs = Identifier .
Rhs = op_paren OrExpr cl_paren .

OrExpr = Identifier pipe Identifier .
OrExpr = Identifier pipe OrExpr .

Identifier = uppercase_id .
Identifier = lowercase_id .

Lexer Grammar

equal = "=" .
dot = "." .
newline = [ "\r" ] "\n" .
quote = "\"" .
backslash = "\\" .
pipe = "|" .
tab = "\t" .
ws = " " .
right_arrow = "->" .
skip = "skip" .
range = ".." .

id = "a".."z" { "a".."z" } .

// \u0000..\u0021, "\"", \u0023..\u005B, "\\", \u005D..\uFFFF
char
   = \u0000..\u0021 | \u0023..\u005B | \u005D..\uFFFF
   | backslash ( quote | backslash )
   .
Source = Rule { newline { newline } Rule }  EOF .

Rule = id ws equal ws Rhs ws dot [ SkipCls ] .

Rule = id newline tab equal ws Rhs { newline tab pipe ws Rhs } newline tab dot [ SkipCls ] .

Rhs
   = quote char quote
   | Range
   .

Range
   = quote char quote range quote char quote
   .

SkipCls
   = right_arrow skip
   .

` equal = "=" . dot = "." . pipe = "|" . newline = [ "\r" ] "\n" . tab = "\t" . ws = " " . -> skip op_paren = "(" . cl_paren = ")" .

uppercase_id = "A".."Z" { "a".."z" } . lowercase_id = "a".."z" { "a".."z" } .

Parser Grammar

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrMissingData is the error that is returned when data is missing.
	ErrMissingData error

	// ErrMissingTokens is the error that is returned when tokens are missing.
	ErrMissingTokens error

	// ErrMissingParseTree is the error that is returned when the parse tree is missing.
	ErrMissingParseTree error
)

Functions

func ApplyResults added in v0.1.11

func ApplyResults[T interface {
	SetError(err error) T
}, R any](parent T, elems []R, err error, fn ModifyFn[T, R]) ([]T, error)

ApplyResults applies a function to each element in the slice and returns a new slice of results.

Parameters:

  • parent: The parent result.
  • elems: The slice of elements to apply the function to.
  • err: The error to set on each result.
  • fn: The function to apply to each element.

Returns:

  • []Result[T, N]: A slice of results with the function applied to each element.
  • error: An error if the function is nil.

func HasTree added in v0.1.11

func HasTree(results []Result, target *slpx.ParseTree) bool

func MakeEvaluate added in v0.1.11

func MakeEvaluate(lexer *sllx.Lexer, parser slpx.Parser, ast map[string]gr.ToASTFn) (evrsl.ApplyOnValidsFn[Result], error)

MakeEvaluate makes an evaluator function that evaluates a sequence of SlParser results. The evaluator function takes a sequence of SlParser results and returns a new sequence of SlParser results. The new sequence of SlParser results is computed by first attempting to lex the input, then attempting to parse the lexer output, and finally attempting to convert the parse output to an AST.

Parameters:

  • lexer: The lexer to use for lexing.
  • parser: The parser to use for parsing.
  • ast: The AST maker to use for converting the parse output to an AST.

Returns:

  • evrsl.ApplyOnValidsFn[Result]: The evaluator function.
  • error: An error if the operation fails.

Types

type ModifyFn added in v0.1.11

type ModifyFn[T interface {
	SetError(err error) T
}, R any] func(result *T, elem R)

type Result added in v0.1.11

type Result struct {
	// contains filtered or unexported fields
}

Result holds all the information regarding the parsing process.

func NewResult added in v0.1.11

func NewResult(data []byte) Result

NewResult creates a new result.

Parameters:

  • data: The data to create the result from.

Returns:

  • Result: The new result.

func (Result) AST added in v0.1.11

func (r Result) AST(table map[string]gr.ToASTFn) ([]Result, error)

AST transforms the parse tree into an abstract syntax tree.

The `ast` function must return one abstract syntax tree node for each parse tree root node. If the `ast` function returns an error, the entire result is marked as invalid.

If the `ast` function is nil, an error is returned.

If the parse tree is missing, an error is returned.

If the `ast` function returns more or less than one abstract syntax tree node, an error is returned.

Parameters:

  • table: The AST maker to use for transforming the parse tree into an abstract syntax tree.

Returns:

  • []Result: A slice containing the result of the transformation process. If successful, it contains the abstract syntax tree nodes generated from the parse tree. Otherwise, it contains the error that occurred during the transformation process.
  • error: An error if the evaluation failed.

func (Result) Data added in v0.1.11

func (r Result) Data() ([]byte, error)

Data returns the data of the result.

Returns:

  • []byte: The data of the result.
  • error: An error if the data is not set.

Errors:

  • ErrMissingData: If the data is not set.

func (Result) Err added in v0.1.11

func (r Result) Err() error

Err returns the error of the result.

Returns:

  • error: The error of the result.

func (Result) HasError added in v0.1.11

func (r Result) HasError() bool

HasError implements the Resulter interface.

func (Result) Lex added in v0.1.11

func (r Result) Lex(lexer *sllx.Lexer) ([]Result, error)

Lex processes the input data using the provided lexer and returns a slice of results.

Parameters:

  • lexer: The lexer to use for processing the input data.

Returns:

  • []*Result[T, N]: A slice containing the result of the lexing process. If successful, it contains the tokens generated from the input data. Otherwise, it contains the error that occurred during the lexing process.
  • error: An error if the evaluation failed.

Errors:

  • ErrMissingData: If the Lex function is called before the data is set.
  • any other error: When the lexer is nil or any other error occurs during the lexing process.

func (Result) LexerErr added in v0.1.11

func (r Result) LexerErr() (error, error)

LexerErr returns the lexer error of the result.

Returns:

  • error: The lexer error of the result.
  • error: An error if the lexer error is not set.

func (Result) Node added in v0.1.11

func (r Result) Node() (*gr.Node, error)

Node returns the node of the result.

Returns:

  • grammar.Node: The node of the result.
  • error: An error if the node is not set.

func (Result) Parse added in v0.1.11

func (r Result) Parse(parser slpx.Parser) ([]Result, error)

Parse processes the input data using the provided parser and returns a slice of results.

Parameters:

  • parser: The parser to use for processing the input data.

Returns:

  • []*Result A slice containing the result of the parsing process. If successful, it contains the parse trees generated from the input data. Otherwise, it contains the error that occurred during the parsing process.
  • error: An error if the evaluation failed.

Errors:

  • ErrMissingTokens: If the Parse function is called before the tokens are set.
  • any other error: When the parser is nil or any other error occurs during the parsing process.

func (Result) ParseTree added in v0.1.11

func (r Result) ParseTree() (*slpx.Result, error)

ParseTree returns the parse tree of the result.

Returns:

  • *slpx.Result: The parse tree of the result.
  • error: An error if the parse tree is not set.

func (Result) SetError added in v0.1.11

func (r Result) SetError(err error) Result

SetError sets the error of the result.

Parameters:

  • err: The error to set.

Returns:

  • Result[T, N]: The result with the error set.

func (Result) Tokens added in v0.1.11

func (r Result) Tokens() ([]*gr.Token, error)

Tokens returns the tokens of the result.

Returns:

  • []*grammar.Token: The tokens of the result.
  • error: An error if the tokens are not set.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL