Documentation ¶
Index ¶
Constants ¶
const (
// LiteralType is the type used for literal tokens.
LiteralType = "l"
)
Variables ¶
var ( // End is a pseude token that ends a chain. End = NewToken("e", "") )
Functions ¶
func TokensEqual ¶
TokensEqual checks two tokens for equality. Two tokens are considered equal if their type and identifier match.
Types ¶
type ReaderTokenizer ¶
type ReaderTokenizer struct {
// contains filtered or unexported fields
}
ReaderTokenizer turns data from an io.Reader into a stream of tokens. It turns newlines ('\n') into End tokens and returns everything else as literal tokens. Each literal token either only contains whitespace and punctuation or no whitespace and punctuation. Two tokens that follow each other do not contain the same type of characters.
Punctuation and whitespace is everything that is a unicode punctuation character (category P) or has Unicode's White Space Property. See the unicode package for details.
func NewTokenizer ¶
func NewTokenizer(r io.Reader) *ReaderTokenizer
NewTokenizer creates a new ReaderTokenizer for the given reader.
func (*ReaderTokenizer) Next ¶
func (t *ReaderTokenizer) Next() (Token, error)
Next returns the next token. See the description of ReaderTokenizer for an explanation of which kind of tokens to expect.
type Token ¶
type Token interface { // Type provides a string identifying the type of the token. Type must // always return the same string and that string must be not be used // by any other type of token used in one Gorkov instance. It may not // contain any null bytes. // // Tokens generated by this package will only use type namse consisting // of a single ASCII letter or digit (a-z, A-Z and 0-9). Type() string // Identifier returns a string identifying this particular token. // Identifier must always return the same string for one token and // that string may not contain any null bytes. // // For two tokens a and b the following must hold: // a.Type() == b.Type() && a.Identifier() == b.Identifier() // is true iff a and b are considered equal. Identifier() string // Value returns the string that is used when generating a text using // this token. This is usually a static string, but can also be // dynamically generated. Value() string }
Token is an one element of a markov chain. Usually this is a word or some whitespace.
type Tokenizer ¶
type Tokenizer interface { // Next returns the next token, if possible. If there are no more errors, // io.EOF will be returned. Next can also return other errors, so // consumers have to check for them. Next() (Token, error) }
Tokenizer allows consuming a stream of tokens.
type TokenizerFunc ¶
TokenizerFunc can be used to turn a function into a Tokenizer: