idf

package
v0.70.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2022 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var StopWordMap = map[string]bool{
	"the":   true,
	"of":    true,
	"is":    true,
	"and":   true,
	"to":    true,
	"in":    true,
	"that":  true,
	"we":    true,
	"for":   true,
	"an":    true,
	"are":   true,
	"by":    true,
	"be":    true,
	"as":    true,
	"on":    true,
	"with":  true,
	"can":   true,
	"if":    true,
	"from":  true,
	"which": true,
	"you":   true,
	"it":    true,
	"this":  true,
	"then":  true,
	"at":    true,
	"have":  true,
	"all":   true,
	"not":   true,
	"one":   true,
	"has":   true,
	"or":    true,
}

StopWordMap the default stop words.

Functions

This section is empty.

Types

type Idf

type Idf struct {
	// contains filtered or unexported fields
}

Idf type a dictionary for all words with the IDFs(Inverse Document Frequency).

func NewIdf

func NewIdf() *Idf

NewIdf create a new Idf

func (*Idf) AddToken

func (i *Idf) AddToken(text string, freq float64, pos ...string) error

AddToken add a new word with IDF into the dictionary.

func (*Idf) Freq

func (i *Idf) Freq(key string) (float64, string, bool)

Freq return the IDF of the word

func (*Idf) LoadDict

func (i *Idf) LoadDict(files ...string) error

LoadDict load the idf dictionary

func (*Idf) NumTokens

func (i *Idf) NumTokens() int

NumTokens return the IDF tokens' num

func (*Idf) TotalFreq

func (i *Idf) TotalFreq() float64

TotalFreq reruen the IDF total frequency

type Segment

type Segment struct {
	// contains filtered or unexported fields
}

Segment type a word with weight.

func (Segment) Text

func (s Segment) Text() string

Text return the segment's text.

func (Segment) Weight

func (s Segment) Weight() float64

Weight return the segment's weight.

type Segments

type Segments []Segment

Segments type a slice of Segment.

func (Segments) Len

func (ss Segments) Len() int

func (Segments) Less

func (ss Segments) Less(i, j int) bool

func (Segments) Swap

func (ss Segments) Swap(i, j int)

type StopWord

type StopWord struct {
	// contains filtered or unexported fields
}

StopWord is a dictionary for all stop words.

func NewStopWord

func NewStopWord() *StopWord

NewStopWord create a new StopWord with the default stop words.

func (*StopWord) AddStop

func (s *StopWord) AddStop(text string)

AddStop add a token to StopWord dictionary.

func (*StopWord) IsStopWord

func (s *StopWord) IsStopWord(word string) bool

IsStopWord check the word is a stop word

func (*StopWord) LoadDict

func (s *StopWord) LoadDict(files ...string) error

LoadDict load the idf stop dictionary

func (*StopWord) RemoveStop

func (s *StopWord) RemoveStop(text string)

RemoveStop remove a token from StopWord dictionary.

type TagExtracter

type TagExtracter struct {
	Idf *Idf
	// contains filtered or unexported fields
}

TagExtracter is extract tags struct.

func (*TagExtracter) ExtractTags

func (t *TagExtracter) ExtractTags(text string, topK int) (tags Segments)

ExtractTags extract the topK key words from text.

func (*TagExtracter) LoadDict

func (t *TagExtracter) LoadDict(fileName ...string) error

LoadDict load and create a new dictionary from the file

func (*TagExtracter) LoadIdf

func (t *TagExtracter) LoadIdf(fileName ...string) error

LoadIdf load and create a new Idf dictionary from the file.

func (*TagExtracter) LoadStopWords

func (t *TagExtracter) LoadStopWords(fileName ...string) error

LoadStopWords load and create a new StopWord dictionary from the file.

func (*TagExtracter) WithGse

func (t *TagExtracter) WithGse(segs gse.Segmenter)

WithGse register the gse segmenter

type TextRanker

type TextRanker struct {
	HMM bool
	// contains filtered or unexported fields
}

TextRanker is extract tags struct.

func (*TextRanker) LoadDict

func (t *TextRanker) LoadDict(fileName ...string) error

LoadDict load and create a new dictionary from the file for Textranker

func (*TextRanker) TextRank

func (t *TextRanker) TextRank(text string, topK int) Segments

TextRank extract keywords from text using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS

func (t *TextRanker) TextRankWithPOS(text string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from text using TextRank algorithm. Parameter allowPOS allows a []string pos list.

func (*TextRanker) WithGse

func (t *TextRanker) WithGse(segs gse.Segmenter)

WithGse register the gse segmenter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL