Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Extensions ¶
type Extensions []string
Extensions is used to tokenize snippets in directories using the list of file extensions.
func (Extensions) ReadLines ¶ added in v0.0.9
func (e Extensions) ReadLines(dirs ...string) apoco.StreamFunc
ReadLines returns a stream function that reads snippet files in the directories (identyfied by the given file extensions) and returns a stream of line tokens. The directories are read in parallel by GOMAXPROCS goroutines.
If a extension ends with `.txt`, one line is read from the text file (no confidences); if the file ends with `.json`, calamari's extended data format is assumed. Otherwise the file is read as a TSV file expecting a char (or a sequence thereof) and its confidence on each line.
func (Extensions) Tokenize ¶
func (e Extensions) Tokenize(ctx context.Context, dirs ...string) apoco.StreamFunc
Tokenize is a helper function that combines ReadLines and TokenizeLines into one function. It is the same as calling `apoco.Pipe(ReadLines, TokenizeLines,...)`.
func (Extensions) TokenizeLines ¶ added in v0.0.9
func (e Extensions) TokenizeLines() apoco.StreamFunc
TokenizeLines returns a stream function that tokenizes and aligns line tokens.