Documentation ¶
Index ¶
- func Filter(vs chan string, filters ...Predicate) chan string
- func IsNotStopWord(v string) bool
- func IsStopWord(v string) bool
- func IsWord(v string) bool
- func LoadStopWords(filename string) error
- func Map(vs chan string, f ...Mapper) chan string
- func ScanAlphaWords(data []byte, atEOF bool) (advance int, token []byte, err error)
- func WordCounts(r io.Reader) (map[string]int, error)
- type Classifier
- type Mapper
- type Predicate
- type StdOption
- type StdTokenizer
- type Tokenizer
- type WeightScheme
- type WeightSchemeStrategy
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Filter ¶
Filter removes elements from the input channel where the supplied predicate is satisfied Filter is a Predicate aggregation
func IsNotStopWord ¶
IsNotStopWord is the inverse function of IsStopWord
func IsStopWord ¶
IsStopWord checks against a list of known english stop words and returns true if v is a stop word; false otherwise
func IsWord ¶
IsWord is a predicate to determine if a string contains at least two characters and doesn't contain any numbers
func LoadStopWords ¶
func ScanAlphaWords ¶
ScanAlphaWords is a function that splits text on whitespace, punctuation, and symbols; derived bufio.ScanWords
Types ¶
type Classifier ¶
type Classifier interface { // Train allows clients to train the classifier Train(io.Reader, string) error // TrainString allows clients to train the classifier using a string TrainString(string, string) error // Classify performs a classification on the input corpus and assumes that // the underlying classifier has been trained. Classify(io.Reader) (string, error) // ClassifyString performs text classification using a string ClassifyString(string) (string, error) }
Classifier provides a simple interface for different text classifiers
type StdOption ¶
type StdOption func(*StdTokenizer)
StdOption provides configuration settings for a StdTokenizer
func BufferSize ¶
BufferSize adjusts the size of the buffered channel
type StdTokenizer ¶
type StdTokenizer struct {
// contains filtered or unexported fields
}
StdTokenizer provides a common document tokenizer that splits a document by word boundaries
func NewTokenizer ¶
func NewTokenizer(opts ...StdOption) *StdTokenizer
NewTokenizer initializes a new standard Tokenizer instance
type Tokenizer ¶
type Tokenizer interface { // Tokenize breaks the provided document into a channel of tokens Tokenize(io.Reader) chan string }
Tokenizer provides a common interface to tokenize documents
type WeightScheme ¶
WeightScheme provides a contract for term frequency weight schemes
func BagOfWords ¶
func BagOfWords(doc map[string]float64) WeightScheme
BagOfWords weight scheme: counts the number of occurrences
func Binary ¶
func Binary(doc map[string]float64) WeightScheme
Binary weight scheme: 1 if present; 0 otherwise
func LogNorm ¶
func LogNorm(doc map[string]float64) WeightScheme
LogNorm weight scheme: returns the natural log of the number of occurrences of a term
func TermFrequency ¶
func TermFrequency(doc map[string]float64) WeightScheme
TermFrequency weight scheme; counts the number of occurrences divided by the number of terms within a document
type WeightSchemeStrategy ¶
type WeightSchemeStrategy func(doc map[string]float64) WeightScheme
WeightSchemeStrategy provides support for pluggable weight schemes