Documentation ¶
Index ¶
- Constants
- func MI(size, ab, amulb, span int) float64
- func MI3(size, ab, amulb, span int) float64
- type Collocation
- type DB
- func (db *DB) Add(text string) error
- func (db *DB) Close() error
- func (db *DB) Collocations(pat1, pat2 string, left, right int, scoreFunc ScoreFunc) ([]Collocation, error)
- func (db *DB) Drop() error
- func (db *DB) Ngrams(n int, scoreFunc ScoreFunc) ([]NgramWithScore, error)
- func (db *DB) Norms(minFrequency, minLength int) ([]WordWithFrequency, error)
- func (db *DB) WordCount() (int, error)
- func (db *DB) WordFrequencyMap() (map[string]int, error)
- func (db *DB) Words(minFrequency, minLength int) ([]WordWithFrequency, error)
- type NgramWithScore
- type ScoreFunc
- type WordScanner
- type WordWithFrequency
Constants ¶
const ( Noun uint32 = 1 << iota // NOUN СУЩ имя существительное AdjF // ADJF ПРИЛ имя прилагательное (полное) AdjS // ADJS КР_ПРИЛ имя прилагательное (краткое) Comp // COMP КОМП компаратив Verb // VERB ГЛ глагол (личная форма) Infn // INFN ИНФ глагол (инфинитив) PrtF // PRTF ПРИЧ причастие (полное) PrtS // PRTS КР_ПРИЧ причастие (краткое) Grnd // GRND ДЕЕПР деепричастие Numr // NUMR ЧИСЛ числительное Advb // ADVB Н наречие Npro // NPRO МС местоимение-существительное Pred // PRED ПРЕДК предикатив Prep // PREP ПР предлог Conj // CONJ СОЮЗ союз Prcl // PRCL ЧАСТ частица Intj // INTJ МЕЖД междометие Adj = AdjF | AdjS Prt = PrtF | PrtS VerbPlus = Verb | Infn | PrtF | PrtS | Grnd Unknown = AdjF | AdjS | Comp | Verb | Infn | PrtF | PrtS | Grnd | Numr | Advb | Npro | Pred | Prep | Conj | Prcl | Intj )
Variables ¶
This section is empty.
Functions ¶
func MI ¶
https://www.english-corpora.org/mutualInformation.asp
In our corpora, Mutual Information is calculated as follows:
MI = log ( (AB * sizeCorpus) / (A * B * span) ) / log (2)
Suppose we are calculating the MI for the collocate color near purple in BNC.
A = frequency of node word (e.g. purple): 1262 B = frequency of collocate (e.g. color): 115 AB = frequency of collocate near the node word (e.g. color near purple): 24 sizeCorpus= size of corpus (# words; in this case the BNC): 96,263,399 span = span of words (e.g. 3 to left and 3 to right of node word): 6 log (2) is literally the log10 of the number 2: .30103
MI = 11.37 = log ( (24 * 96,263,399) / (1262 * 115 * 6) ) / .30103
Types ¶
type Collocation ¶
type NgramWithScore ¶
type WordScanner ¶
type WordScanner struct {
// contains filtered or unexported fields
}
func NewWordScanner ¶
func NewWordScanner(r io.Reader) *WordScanner
func (*WordScanner) Err ¶
func (s *WordScanner) Err() error
func (*WordScanner) Scan ¶
func (s *WordScanner) Scan() bool