Documentation ¶
Index ¶
- Constants
- Variables
- func GetTokenizers() map[string]Tokenizer
- func Register(name string, t Tokenizer)
- type Emitter
- type Filter
- type FilterFunc
- type Filters
- type IncludeRule
- type Lexer
- func (l Lexer) AcceptsFilename(name string) (bool, error)
- func (l Lexer) AcceptsMediaType(media string) (bool, error)
- func (l Lexer) Format(r *bufio.Reader, emit func(Token) error) error
- func (l Lexer) ListFilenames() []string
- func (l Lexer) ListMediaTypes() []string
- func (l Lexer) Tokenize(br *bufio.Reader, emit func(Token) error) error
- func (l Lexer) TokenizeString(s string) ([]Token, error)
- type RegexpRule
- type Rule
- type RuleSpec
- type Stack
- type State
- type StateMap
- type States
- type StatesSpec
- type Token
- type TokenType
- type Tokenizer
Constants ¶
const ( // Error, emitted when unexpected token was encountered. Error TokenType = "error" // Comment e.g. `// this should never happen` Comment = "comment" // Number - e.g. `2716057` in `"serial": 2716057` or `serial = 2716057;` Number = "number" // String - e.g. `Fry` in `"name": "Fry"` or `var name = "Fry";` String = "string" // Text - e.g. `Fry` in `<p>Fry</p>` Text = "text" // Attribute - e.g. `name` in `"name": "Fry"`, or `font-size` in // `font-size: 1.2rem;` Attribute = "attribute" // Assignment - e.g. `=` in `int x = y;` or `:` in `font-size: 1.2rem;` Assignment = "assignment" // Operator - e.g. `+`/`-` in `int x = a + b - c;` Operator = "operator" // Punctuation - e.g. semi/colons in `int x, j;` Punctuation = "punctuation" // Literal - e.g. `true`/`false`/`null`. Literal = "literal" // Tag - e.g. `html`/`div`/`b` Tag = "tag" // Whitespace - e.g. \n, \t Whitespace = "whitespace" )
Variables ¶
var EndToken = Token{}
var MergeTokensFilter = FilterFunc( func(out func(Token) error) func(Token) error { curr := Token{} return func(t Token) error { if t.Type == "" { out(curr) return io.EOF } else if t.Type == curr.Type { curr.Value += t.Value return nil } else if curr.Value != "" { out(curr) } curr = Token{ Value: t.Value, Type: t.Type, State: t.State, } return nil } })
MergeTokensFilter combines Tokens if they have the same type.
var PassthroughFilter = FilterFunc( func(out func(Token) error) func(Token) error { return func(t Token) error { return out(t) } })
PassthroughFilter simply emits each token to the output without modification.
var RemoveEmptiesFilter = FilterFunc( func(out func(Token) error) func(Token) error { return func(t Token) error { if t == EndToken || t.Value != "" { return out(t) } return nil } })
RemoveEmptiesFilter removes empty (zero-length) tokens from the output.
Functions ¶
func GetTokenizers ¶
GetTokenizers returns the map of known Tokenizers.
Types ¶
type Filter ¶
type Filter interface { // Filter reads tokens from `in` and outputs tokens to `out`, typically // modifying or filtering them along the way. The function should return // as soon as the input is exhausted (i.e. the channel is closed), or an // error is encountered. Filter(out func(Token) error) func(Token) error }
Filter describes a type that is capable of filtering/processing tokens.
type FilterFunc ¶
FilterFunc is a helper type allowing filter functions to be used as filters.
type Filters ¶
type Filters []Filter
func (Filters) Filter ¶
Filter runs the input through each filter in series, emitting the final result to `out`. This function will return as soon as the last token has been processed, or iff an error is encountered by one of the filters.
It is safe to close the output channel as soon as this function returns.
type IncludeRule ¶
IncludeRule allows the states of another Rule to be referenced.
func (IncludeRule) Stack ¶
func (r IncludeRule) Stack() []string
type Lexer ¶
type Lexer struct { Name string States States Filters Filters Formatter Filter Filenames []string MimeTypes []string }
Lexer defines a simple state-based lexer.
func (Lexer) AcceptsFilename ¶
AcceptsFilename returns true if this Lexer thinks it is suitable for the given filename. An error will be returned iff an invalid filename pattern is registered by the Lexer.
func (Lexer) AcceptsMediaType ¶
AcceptsMediaType returns true if this Lexer thinks it is suitable for the given meda (MIME) type. An error wil be returned iff the given mime type is invalid.
func (Lexer) ListFilenames ¶
ListFilenames lists the filename patterns this Lexer supports, e.g. ["*.json"]
func (Lexer) ListMediaTypes ¶
ListMediaTypes lists the media types this Lexer supports, e.g. ["application/json"]
type RegexpRule ¶
type RegexpRule struct { Regexp *regexp.Regexp Type TokenType SubTypes []TokenType NextStates []string }
RegexpRule matches a state if the subject matches a regular expression.
func NewRegexpRule ¶
func NewRegexpRule(re string, t TokenType, subTypes []TokenType, next []string) RegexpRule
NewRegexpRule creates a new regular expression Rule.
func (RegexpRule) Find ¶
func (r RegexpRule) Find(subject string) (int, Rule)
Find returns the first position in subject where this Rule will match, or -1 if no match was found.
func (RegexpRule) Match ¶
Match attempts to match against the beginning of the given search string. Returns the number of characters matched, and an array of tokens.
If the regular expression contains groups, they will be matched with the corresponding token type in `Rule.Types`. Any text inbetween groups will be returned using the token type defined by `Rule.Type`.
func (RegexpRule) Stack ¶
func (r RegexpRule) Stack() []string
type RuleSpec ¶
type RuleSpec struct { // Regexp is the regular expression this rule should match against. Regexp string // Type is the token type for strings that match this rule. Type TokenType // SubTypes contains an ordered array of token types matching the order // of groups in the Regexp expression. SubTypes []TokenType // State indicates the next state to migrate to if this rule is // triggered. State string // Include specifies a state to run Include string }
Rule describes the conditions required to match some subject text.
type Stack ¶
type Stack []string
Stack is a simple stack of string values.
type State ¶
type State []Rule
State is a list of matching Rules.
func (State) Find ¶
Find examines the provided string, looking for a match within the current state. It returns the position `n` at which a rule match was found, and the rule itself.
-1 will be returned if no rule could be matched, in which case the caller should disregard the string entirely (emit it as an error), and continue onto the next line of input.
0 will be returned if a rule matches at the start of the string.
Otherwise, this function will return a number of characters to skip before reaching the first matched rule. The caller should emit those first `n` characters as an error, and emit the remaining characters according to the rule.
func (State) Match ¶
Match tests the subject text against all rules within the State. If a match is found, it returns the number of characters consumed, a series of tokens consumed from the subject text, and the specific Rule that was succesfully matched against.
If the start of the subject text can not be matched against any known rule, it will return a position of -1 and a nil Rule.
type StatesSpec ¶
StatesSpec is a container for Lexer rule specifications, and can be compiled into a full state machine.
func (StatesSpec) Compile ¶
func (m StatesSpec) Compile() (States, error)
Compile compiles the specified states into a complete State machine, returning an error if any state fails to compile for any reason.
func (StatesSpec) Get ¶
func (m StatesSpec) Get(name string) State
func (StatesSpec) MustCompile ¶
func (m StatesSpec) MustCompile() States
MustCompile is a helper method that compiles the State specification, panicing on error.
type Token ¶
Token represents one item of parsed output, containing the parsed value and its detected type.
type Tokenizer ¶
type Tokenizer interface { // Tokenize reads from the given input and emits tokens to the output // channel. Will end on any error from the reader, including io.EOF to // signify the end of input. Tokenize(*bufio.Reader, func(Token) error) error // Format behaves exactly as Tokenize, except it also formats the output. Format(*bufio.Reader, func(Token) error) error // AcceptsFilename returns true if this Lexer thinks it is suitable for // the given filename. An error will be returned iff an invalid filename // pattern is registered by the Lexer. AcceptsFilename(name string) (bool, error) // AcceptsMediaType returns true if this Lexer thinks it is suitable for // the given meda (MIME) type. An error wil be returned iff the given mime // type is invalid. AcceptsMediaType(name string) (bool, error) // ListMediaTypes lists the media types this Tokenizer advertises support // for, e.g. ["application/json"] ListMediaTypes() []string // ListFilenames lists the filename patterns this Tokenizer advertises // support for, e.g. ["*.json"] ListFilenames() []string }
Tokenizer represents a type capable of tokenizing data from an input source.
func GetTokenizer ¶
GetTokenizer returns the Tokenizer of the given name.
func GetTokenizerForContentType ¶
GetTokenizerForContentType returns a Tokenizer for the given content type (e.g. "text/html" or "application/json"), or nil if one is not found.
func GetTokenizerForFilename ¶
GetTokenizerForFilename returns a Tokenizer for the given filename (e.g. "index.html" or "jasons.json"), or nil if one is not found.