Documentation ¶
Index ¶
- func DebugStringify(ts []TokenValue) (ret string)
- func Equals(ts []TokenValue, ws Span) (okay bool)
- func FindExactMatch(ts []TokenValue, spans []Span) (ret int)
- func HasPrefix(ts []TokenValue, prefix []Word) (okay bool)
- func Hash(s string) uint64
- func JoinWords(ws []Word) string
- func NewTokenizer(n Notifier) charm.State
- func NormalizeAll(ts []TokenValue) (ret string, err error)
- func NormalizeTokens(ts []TokenValue) (ret string, width int)
- func Stringify(ts []TokenValue) (ret string, width int)
- func StripArticle(str string) (ret string)
- type AfterDocument
- type AsyncDoc
- type Collector
- type Notifier
- type Pos
- type Span
- type SpanList
- type Token
- type TokenValue
- type Tokenizer
- type Word
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DebugStringify ¶
func DebugStringify(ts []TokenValue) (ret string)
turn all of the passed tokens into a helpful string representation
func Equals ¶
func Equals(ts []TokenValue, ws Span) (okay bool)
func FindExactMatch ¶
func FindExactMatch(ts []TokenValue, spans []Span) (ret int)
search for a span in a list of spans; return the index of the span that matched.
func HasPrefix ¶
func HasPrefix(ts []TokenValue, prefix []Word) (okay bool)
func NewTokenizer ¶
func NormalizeAll ¶
func NormalizeAll(ts []TokenValue) (ret string, err error)
same as Normalize but errors if all of the tokens weren't consumed.
func NormalizeTokens ¶ added in v0.24.8
func NormalizeTokens(ts []TokenValue) (ret string, width int)
turn a series of string tokens into a normalized string returns the number of string tokens consumed. somewhat dubious because it mimics inflect.Normalize without calling it.
func Stringify ¶
func Stringify(ts []TokenValue) (ret string, width int)
turn a series of string tokens into a space padded string returns the number of string tokens consumed.
func StripArticle ¶
return the name after removing leading articles eats any errors it encounters and returns the original name
Types ¶
type AfterDocument ¶
handle the parsed document. the document data also includes the unprocessed content which ended the document. ( ex. deindentation )
type AsyncDoc ¶ added in v0.24.8
type AsyncDoc struct { // the final document ( or error if file.ReadTellRunes failed ) Content any // contains filtered or unexported fields }
reads a document via channels ( which allows reading a (sub) document to become a state in a larger document )
func (AsyncDoc) ParseUnhandledContent ¶ added in v0.24.8
Sub-documents are defined by their indentation level. And, on each new line they have to collect enough whitespace to determine whether the line is part of their content. If the line has a lesser indent, the doc ends, but it still has the whitespace it collected (which the parent doc needs.) ParseUnhandledContent() sends that whitespace to the passed state.
type Collector ¶
type Collector struct { Tokens []TokenValue // lines is filled from Tokens on every new line. // its empty if BreakLines is false // Tokens can have values with trailing assignments. // ie. ':' isn't considered an end of line here.... // tbd: it might be nice to change that only lines *or* tokens is valid. Lines [][]TokenValue KeepComments bool BreakLines bool LineOffset int }
implements Notifier to accumulate tokens from the parser
func (*Collector) Decoded ¶
func (at *Collector) Decoded(tv TokenValue) error
func (*Collector) TokenizeString ¶ added in v0.24.8
lineOffset adjusts the positions in the parsed tokens.
type Notifier ¶
type Notifier interface {
Decoded(TokenValue) error
}
callback when a new token exists tbd: maybe a channel instead?
type Span ¶
type Span []Word
Span - implements Match for a chain of individual words.
func FindCommonArticles ¶
func FindCommonArticles(ts []TokenValue) (ret Span, width int)
for now, the common articles are a fixed set. when the author specifies some particular indefinite article for a noun that article only gets used for printing the noun; it doesn't enhance the parsing of the story. ( it would take some work to lightly hold the relation between a name and an article then parse a sentence matching names to nouns in the fwiw: the articles in inform also seems to be predetermined in this way. )
type SpanList ¶
type SpanList []Span
func PanicSpans ¶
func (SpanList) FindExactMatch ¶
func (ws SpanList) FindExactMatch(ts []TokenValue) (ret Span, width int)
func (SpanList) FindPrefix ¶
func (ws SpanList) FindPrefix(words []TokenValue) (ret Span, width int)
this is the same as FindPrefixIndex only it returns a Span instead of an index
func (SpanList) FindPrefixIndex ¶
func (ws SpanList) FindPrefixIndex(words []TokenValue) (retWhich int, retWidth int)
see anything in our span list starts the passed words. for instance, if the span list contains the span "oh hello" then the words "oh hello world" will match returns the index of the index and length of the longest prefix
type Token ¶
type Token int
const ( Invalid Token = iota // placeholder, not generated by the tokenizer Comma // a comma Comment // ex. `# something`, minus the hash Parenthetical // ex. `( something )`, minus parens Quoted // ex. `"something"`, minus the quotes Stop // full stop or other terminal String // delimited by spaces and other special runes Tell // tell subdoc )
types of tokens
type TokenValue ¶
type TokenValue struct { Token Token Pos Pos Value any // a string, expect for Tell subdocuments First bool // helper to know if this is the first token of a sentence }
func TokenizeString ¶ added in v0.24.8
func TokenizeString(str string) (ret []TokenValue, err error)
uses Collector to turn the passed string into a slice of tokens. by default, throws out all comments and merges newlines.
func (TokenValue) Equals ¶
func (w TokenValue) Equals(other uint64) bool
func (TokenValue) Hash ¶
func (tv TokenValue) Hash() (ret uint64)
func (TokenValue) String ¶
func (tv TokenValue) String() (ret string)
a string *representation* of the value
type Tokenizer ¶
type Tokenizer struct { Notifier // contains filtered or unexported fields }
read pieces of plain text documents