tok

package

v0.0.8 Latest Latest Go to latest Published: Oct 10, 2018 License: BSD-3-Clause, BSD-3-Clause Imports: 4 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/caltechlibrary/bibtex

Links

Open Source Insights

README ¶

tok

A niave tokenizer library

Public Interface

Backup - given a token and buffer return a new buffer with the token's value as prefix
- parameters
  - Token
  - buffer (byte array) returns
  - buffer (byte array)
Between - returns the value between an opening and closing delimiter values,
- parameters
  - open value (byte array)
  - close value (byte array)
  - escape vaue (byte array)
  - buffer (byte array)
- returns
  - between content (byte array)
  - buffer (byte array)
  - error value if closing value not found before end of buffer
Peek - returns the next token without consuming the buffer being scanned
- parameters
  - buffer (byte array)
- returns
  - Token
Skip - scans through a buffer until a token is found, returns skipped content, token and remaining buffer
- parameters
  - Token
  - buffer (byte array)
- returns
  - skipped content (byte array)
  - Token
  - buffer (byte array)
Skip2 - like Skip but allows a Tokenizer to be passed in rather than using the default Tok().
- parameters
  - Token
  - buffer (byte array)
  - Tokenizer function
- returns
  - skipped content (byte array)
  - Token
  - buffer (byte array)
Token - a simple structure
- properties
  - Type is a string holding the label of the token type
  - Value is a byte array holding the value of the token
Tokenizer - is a type of function that can be applied by Tok2, may be recursive
- parameters
  - byte array
  - a Tokenizer function
- returns
  - Token
  - byte array of remaining buffer
Tok - is a simple, non-look ahead tokenizer
- parameter
  - a byte array representing the buffer to evaluate
- returns
  - a Token of Type Letter, Numeral, Punctuation and Space
  - the remaining buffer byte array
Tok2 - is a function the take
- parameters
  - a byte array representing the buffer to evaluate
  - A Tokenizer function
- returns
  - a Token of Type defined by the Tokenizer function
  - the remaining buffer byte array
Words - Is an example Tokenizer function
- returns tokens of type Numeral, Punctuation, Space and Word

Documentation ¶

Overview ¶

Package tok is a niave tokenizer

@author R. S. Doiel, <rsdoiel@caltech.edu>

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index ¶

Constants
Variables
func Backup(token *Token, buf []byte) []byte
func Between(openValue []byte, closeValue []byte, escapeValue []byte, buf []byte) ([]byte, []byte, error)
func IsNumeral(b []byte) bool
func IsPunctuation(b []byte) bool
func IsSpace(b []byte) bool
func Next(buf []byte, re *regexp.Regexp) ([]byte, []byte)
func NextLine(buf []byte) ([]byte, []byte)
type Token
- func (t *Token) String() string
type TokenMap
type Tokenizer

Constants ¶

View Source

const (
	// Version of  tok package
	Version = `v0.0.2`

	// Letter is an alphabetical letter (e.g. A-Z, a-z in English)
	Letter = "Letter"
	// Numeral is a single digit
	Numeral = "Numeral"
	// Punctuation is any non-number, non alphametical character, non-space (e.g. periods, colons, bang, hash mark)
	Punctuation = "Punctuation"
	// Space characters representing white space (e.g. space, tab, new line, carriage return)
	Space = "Space"

	// Words a sequence of characters delimited by spaces
	Word = "Word"
	// OpenCurly bracket, e.g. "{"
	OpenCurlyBracket = "OpenCurlyBracket"
	// CloseCurly bracket, e.g. "}"
	CloseCurlyBracket = "CloseCurlyBracket"
	// CurlyBracket, e.g. "{}"
	CurlyBracket = "CurlyBracket"
	// OpenSquareBracket, e.g. "["
	OpenSquareBracket = "OpenSquareBracket"
	// CloseSquareBracket, e.g. "]"
	CloseSquareBracket = "CloseSquareBracket"
	// SquareBracket, e.g. "[]"
	SquareBracket = "SquareBracket"
	// OpenAngleBracket, e.g. "<"
	OpenAngleBracket = "OpenAngleBracket"
	// CloseAngleBracket, e.g. ">"
	CloseAngleBracket = "CloseAngleBracket"
	// AngleBracket, e.g. "<>"
	AngleBracket = "AngleBracket"
	// AtSign, e.g. "@"
	AtSign = "AtSign"
	// EqualSign, e.g. "="
	EqualSign = "EqualSign"
	// DoubleQuote, e.g. "\""
	DoubleQuote = "DoubleQuote"
	// SingleQuote, e.g., "'"
	SingleQuote = "SingleQuote"

	// EOF is an end of file token type. It is separate form Space only because of it being a common stop condition
	EOF = "EOF"
)

Variables ¶

View Source

var (
	// Numerals is a map of numbers as strings
	Numerals = []byte("0123456789")

	// Spaces is a map space symbols as strings
	Spaces = []byte(" \t\r\n")

	// PunctuationMarks map as strings
	PunctuationMarks = []byte("~!@#$%^&*()_+`-=:{}|[]\\:;\"'<>?,./")

	// These map to the specialized tokens
	AtSignMark = []byte("@")
	// EqualMark, e.g. =
	EqualMark = []byte("=")
	// DoubleQuoteMark, e.g. "\""
	DoubleQuoteMark = []byte("\"")
	// SingleQuoteMark, e.g. "'"
	SingleQuoteMark = []byte("'")

	// OpenCurlyBrackets token
	OpenCurlyBrackets = []byte("{")
	// CloseCurlyBrackets token
	CloseCurlyBrackets = []byte("}")
	// CurlyBrackets tokens
	CurlyBrackets = []byte("{}")

	// OpenSquareBrackets token
	OpenSquareBrackets = []byte("[")
	// CloseSquareBrackets token
	CloseSquareBrackets = []byte("]")
	// SquareBrackets tokens
	SquareBrackets = []byte("[]")

	// OpenAngleBrackets token
	OpenAngleBrackets = []byte("<")
	// CloseAngleBrackets token
	CloseAngleBrackets = []byte(">")
	// AngleBrackets tokens
	AngleBrackets = []byte("<>")
)

Functions ¶

func Backup ¶

func Backup(token *Token, buf []byte) []byte

Backup pushes a Token back onto the front of a Buffer

func Between ¶

func Between(openValue []byte, closeValue []byte, escapeValue []byte, buf []byte) ([]byte, []byte, error)

Between returns the buf between two delimiters (e.g. curly braces)

func IsNumeral ¶

func IsNumeral(b []byte) bool

IsNumeral checks to see if []byte is a number or not

func IsPunctuation ¶

func IsPunctuation(b []byte) bool

IsPunctuation checks to see if []byte is some punctuation or not

func IsSpace ¶

func IsSpace(b []byte) bool

IsSpace checks to see if []byte is a space or not

func Next ¶

func Next(buf []byte, re *regexp.Regexp) ([]byte, []byte)

Next takes a buffer ([]byte) and a regular expression (string) and returns two []byte, first is the sub []byte until the expression is found or end of buf and the second is the remaining []byte array

func NextLine ¶

func NextLine(buf []byte) ([]byte, []byte)

NextLine takes a buffer ([]byte) and returns the next line as a []byte and the remainder as a []byte.

Types ¶

type Token ¶

type Token struct {
	XMLName xml.Name `json:"-"`
	Type    string   `xml:"type" json:"type"`
	Value   []byte   `xml:"value" json:"value"`
}

Token structure for emitting simply tokens and value from Tok() and Tok2()

func Peek ¶

func Peek(buf []byte) *Token

Peek generates a token without consuming the buffer

func Skip ¶

func Skip(tokenType string, buf []byte) ([]byte, *Token, []byte)

Skip provides a means to advance to the next non-target Token.

func Skip2 ¶

func Skip2(tokenType string, buf []byte, fn Tokenizer) ([]byte, *Token, []byte)

func Tok ¶

func Tok(buf []byte) (*Token, []byte)

Tok is a naive tokenizer that looks only at the next character by shifting it off the []byte and returning a token found with remaining []byte

func Tok2 ¶

func Tok2(buf []byte, fn Tokenizer) (*Token, []byte)

Tok2 provides an easy to implement look ahead tokenizer by defining a look ahead function

func TokenFromMap ¶

func TokenFromMap(t *Token, m map[string][]byte) *Token

TokenFromMap, revaluates token type against a map of type names and byte arrays returns modified Token

func Words ¶

func Words(tok *Token, buf []byte) (*Token, []byte)

Words is an example of implementing a Tokenizer function

func (*Token) String ¶

func (t *Token) String() string

String returns a human readable Token struct

type TokenMap ¶

type TokenMap map[string][]byte

TokenMap is a map of simple token names and associated array of possible bytes

type Tokenizer ¶

type Tokenizer func(*Token, []byte) (*Token, []byte)

Tokenizer is a function that takes a current token, looks ahead in []byte and returns a revised token and remaining []byte

Source Files ¶

View all Source files

tok.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL