idf

package

v0.70.0 Latest Latest Go to latest Published: Jan 18, 2022 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/RicardoL1u/gse

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
type Idf
- func NewIdf() *Idf
type Segment
- func (s Segment) Text() string
- func (s Segment) Weight() float64
type Segments
type StopWord
- func NewStopWord() *StopWord
type TagExtracter
type TextRanker

Constants ¶

This section is empty.

Variables ¶

View Source

var StopWordMap = map[string]bool{
	"the":   true,
	"of":    true,
	"is":    true,
	"and":   true,
	"to":    true,
	"in":    true,
	"that":  true,
	"we":    true,
	"for":   true,
	"an":    true,
	"are":   true,
	"by":    true,
	"be":    true,
	"as":    true,
	"on":    true,
	"with":  true,
	"can":   true,
	"if":    true,
	"from":  true,
	"which": true,
	"you":   true,
	"it":    true,
	"this":  true,
	"then":  true,
	"at":    true,
	"have":  true,
	"all":   true,
	"not":   true,
	"one":   true,
	"has":   true,
	"or":    true,
}

StopWordMap the default stop words.

Functions ¶

This section is empty.

Types ¶

type Idf ¶

type Idf struct {
	// contains filtered or unexported fields
}

Idf type a dictionary for all words with the IDFs(Inverse Document Frequency).

func NewIdf ¶

func NewIdf() *Idf

NewIdf create a new Idf

func (*Idf) AddToken ¶

func (i *Idf) AddToken(text string, freq float64, pos ...string) error

AddToken add a new word with IDF into the dictionary.

func (*Idf) Freq ¶

func (i *Idf) Freq(key string) (float64, string, bool)

Freq return the IDF of the word

func (*Idf) LoadDict ¶

func (i *Idf) LoadDict(files ...string) error

LoadDict load the idf dictionary

func (*Idf) NumTokens ¶

func (i *Idf) NumTokens() int

NumTokens return the IDF tokens' num

func (*Idf) TotalFreq ¶

func (i *Idf) TotalFreq() float64

TotalFreq reruen the IDF total frequency

type Segment ¶

type Segment struct {
	// contains filtered or unexported fields
}

Segment type a word with weight.

func (Segment) Text ¶

func (s Segment) Text() string

Text return the segment's text.

func (Segment) Weight ¶

func (s Segment) Weight() float64

Weight return the segment's weight.

type Segments ¶

type Segments []Segment

Segments type a slice of Segment.

func (Segments) Len ¶

func (ss Segments) Len() int

func (Segments) Less ¶

func (ss Segments) Less(i, j int) bool

func (Segments) Swap ¶

func (ss Segments) Swap(i, j int)

type StopWord ¶

type StopWord struct {
	// contains filtered or unexported fields
}

StopWord is a dictionary for all stop words.

func NewStopWord ¶

func NewStopWord() *StopWord

NewStopWord create a new StopWord with the default stop words.

func (*StopWord) AddStop ¶

func (s *StopWord) AddStop(text string)

AddStop add a token to StopWord dictionary.

func (*StopWord) IsStopWord ¶

func (s *StopWord) IsStopWord(word string) bool

IsStopWord check the word is a stop word

func (*StopWord) LoadDict ¶

func (s *StopWord) LoadDict(files ...string) error

LoadDict load the idf stop dictionary

func (*StopWord) RemoveStop ¶

func (s *StopWord) RemoveStop(text string)

RemoveStop remove a token from StopWord dictionary.

type TagExtracter ¶

type TagExtracter struct {
	Idf *Idf
	// contains filtered or unexported fields
}

TagExtracter is extract tags struct.

func (*TagExtracter) ExtractTags ¶

func (t *TagExtracter) ExtractTags(text string, topK int) (tags Segments)

ExtractTags extract the topK key words from text.

func (*TagExtracter) LoadDict ¶

func (t *TagExtracter) LoadDict(fileName ...string) error

LoadDict load and create a new dictionary from the file

func (*TagExtracter) LoadIdf ¶

func (t *TagExtracter) LoadIdf(fileName ...string) error

LoadIdf load and create a new Idf dictionary from the file.

func (*TagExtracter) LoadStopWords ¶

func (t *TagExtracter) LoadStopWords(fileName ...string) error

LoadStopWords load and create a new StopWord dictionary from the file.

func (*TagExtracter) WithGse ¶

func (t *TagExtracter) WithGse(segs gse.Segmenter)

WithGse register the gse segmenter

type TextRanker ¶

type TextRanker struct {
	HMM bool
	// contains filtered or unexported fields
}

TextRanker is extract tags struct.

func (*TextRanker) LoadDict ¶

func (t *TextRanker) LoadDict(fileName ...string) error

LoadDict load and create a new dictionary from the file for Textranker

func (*TextRanker) TextRank ¶

func (t *TextRanker) TextRank(text string, topK int) Segments

TextRank extract keywords from text using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS ¶

func (t *TextRanker) TextRankWithPOS(text string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from text using TextRank algorithm. Parameter allowPOS allows a []string pos list.

func (*TextRanker) WithGse ¶

func (t *TextRanker) WithGse(segs gse.Segmenter)

WithGse register the gse segmenter

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL