Documentation ¶
Overview ¶
Package tok is a niave tokenizer
@author R. S. Doiel, <rsdoiel@caltech.edu>
Copyright (c) 2016, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index ¶
- Constants
- Variables
- func Backup(token *Token, buf []byte) []byte
- func Between(openValue []byte, closeValue []byte, escapeValue []byte, buf []byte) ([]byte, []byte, error)
- func IsNumeral(b []byte) bool
- func IsPunctuation(b []byte) bool
- func IsSpace(b []byte) bool
- func Next(buf []byte, re *regexp.Regexp) ([]byte, []byte)
- func NextLine(buf []byte) ([]byte, []byte)
- type Token
- func Peek(buf []byte) *Token
- func Skip(tokenType string, buf []byte) ([]byte, *Token, []byte)
- func Skip2(tokenType string, buf []byte, fn Tokenizer) ([]byte, *Token, []byte)
- func Tok(buf []byte) (*Token, []byte)
- func Tok2(buf []byte, fn Tokenizer) (*Token, []byte)
- func TokenFromMap(t *Token, m map[string][]byte) *Token
- func Words(tok *Token, buf []byte) (*Token, []byte)
- type TokenMap
- type Tokenizer
Constants ¶
const ( // Version of tok package Version = `v0.0.2` // Letter is an alphabetical letter (e.g. A-Z, a-z in English) Letter = "Letter" // Numeral is a single digit Numeral = "Numeral" // Punctuation is any non-number, non alphametical character, non-space (e.g. periods, colons, bang, hash mark) Punctuation = "Punctuation" // Space characters representing white space (e.g. space, tab, new line, carriage return) Space = "Space" // Words a sequence of characters delimited by spaces Word = "Word" // OpenCurly bracket, e.g. "{" OpenCurlyBracket = "OpenCurlyBracket" // CloseCurly bracket, e.g. "}" CloseCurlyBracket = "CloseCurlyBracket" // CurlyBracket, e.g. "{}" CurlyBracket = "CurlyBracket" // OpenSquareBracket, e.g. "[" OpenSquareBracket = "OpenSquareBracket" // CloseSquareBracket, e.g. "]" CloseSquareBracket = "CloseSquareBracket" // SquareBracket, e.g. "[]" SquareBracket = "SquareBracket" // OpenAngleBracket, e.g. "<" OpenAngleBracket = "OpenAngleBracket" // CloseAngleBracket, e.g. ">" CloseAngleBracket = "CloseAngleBracket" // AngleBracket, e.g. "<>" AngleBracket = "AngleBracket" // AtSign, e.g. "@" AtSign = "AtSign" // EqualSign, e.g. "=" EqualSign = "EqualSign" // DoubleQuote, e.g. "\"" DoubleQuote = "DoubleQuote" // SingleQuote, e.g., "'" SingleQuote = "SingleQuote" // EOF is an end of file token type. It is separate form Space only because of it being a common stop condition EOF = "EOF" )
Variables ¶
var ( // Numerals is a map of numbers as strings Numerals = []byte("0123456789") // Spaces is a map space symbols as strings Spaces = []byte(" \t\r\n") // PunctuationMarks map as strings PunctuationMarks = []byte("~!@#$%^&*()_+`-=:{}|[]\\:;\"'<>?,./") // These map to the specialized tokens AtSignMark = []byte("@") // EqualMark, e.g. = EqualMark = []byte("=") // DoubleQuoteMark, e.g. "\"" DoubleQuoteMark = []byte("\"") // SingleQuoteMark, e.g. "'" SingleQuoteMark = []byte("'") // OpenCurlyBrackets token OpenCurlyBrackets = []byte("{") // CloseCurlyBrackets token CloseCurlyBrackets = []byte("}") // CurlyBrackets tokens CurlyBrackets = []byte("{}") // OpenSquareBrackets token OpenSquareBrackets = []byte("[") // CloseSquareBrackets token CloseSquareBrackets = []byte("]") // SquareBrackets tokens SquareBrackets = []byte("[]") // OpenAngleBrackets token OpenAngleBrackets = []byte("<") // CloseAngleBrackets token CloseAngleBrackets = []byte(">") // AngleBrackets tokens AngleBrackets = []byte("<>") )
Functions ¶
func Between ¶
func Between(openValue []byte, closeValue []byte, escapeValue []byte, buf []byte) ([]byte, []byte, error)
Between returns the buf between two delimiters (e.g. curly braces)
func IsPunctuation ¶
IsPunctuation checks to see if []byte is some punctuation or not
Types ¶
type Token ¶
type Token struct { XMLName xml.Name `json:"-"` Type string `xml:"type" json:"type"` Value []byte `xml:"value" json:"value"` }
Token structure for emitting simply tokens and value from Tok() and Tok2()
func Tok ¶
Tok is a naive tokenizer that looks only at the next character by shifting it off the []byte and returning a token found with remaining []byte
func Tok2 ¶
Tok2 provides an easy to implement look ahead tokenizer by defining a look ahead function
func TokenFromMap ¶
TokenFromMap, revaluates token type against a map of type names and byte arrays returns modified Token