Documentation ¶
Overview ¶
Package nlp provides basic NLP utilities.
Index ¶
- Variables
- func Lda(docTokens [][]string, k int) (map[string][]float64, [][]int)
- func LdaThreads(docTokens [][]string, k, numThreads int) (map[string][]float64, [][]int)
- func Stem(s string) string
- func TfIdf(docTokens [][]string) []map[string]float64
- func Tokenize(s string, keepStopWords bool) []string
Constants ¶
This section is empty.
Variables ¶
var LdaVerbose = false
LdaVerbose determines whether progress information should be printed during LDA. For debugging.
var StopWords = map[string]bool{}/* 569 elements not displayed */
StopWords is a map of stop words, for token filtering. Modifying this map will affect the Tokenize function.
Taken from: http://www.ranks.nl/stopwords
var Tokenizer = regexp.MustCompile("\\w([\\w']*\\w)?")
Tokenizer splits text into tokens. This regexp represents a single word. Changing this regexp will affect the Tokenize function.
Functions ¶
func Lda ¶
Lda performs LDA on the given data. docTokens should contain tokenized documents, such that docTokens[i][j] is the j'th token in the i'th document. k is the number of topics. Returns the topics and token-topic assignment, respective to docTokens.
Topics are returned in a map from word to a probability vector, such that the i'th position is the probability of the i'th topic generating that word. For each i, the i'th position of all words sum to 1.
func LdaThreads ¶
LdaThreads is like the function Lda but runs on multiple subroutines. Calling this function with 1 thread is equivalent to calling Lda.
Types ¶
This section is empty.