analyse

package

v0.0.0-...-36c17a1 Latest Latest Go to latest Published: Dec 3, 2022 License: AGPL-3.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fumiama/jieba

Links

Open Source Insights

Documentation ¶

Overview ¶

Package analyse is the Golang implementation of Jieba's analyse module.

Example (ExtractTags) ¶

var t TagExtracter
t.LoadDictionaryAt("../dict.txt")
t.LoadIdfAt("idf.txt")

sentence := "这是一个伸手不见五指的黑夜。我叫孙悟空，我爱北京，我爱Python和C++。"
segments := t.ExtractTags(sentence, 5)
fmt.Printf("Top %d tags:", len(segments))
for _, segment := range segments {
	fmt.Printf(" %s /", segment.Text())
}

Output:

Top 5 tags: Python / C++ / 伸手不见五指 / 孙悟空 / 黑夜 /

Example (TextRank) ¶

t, err := NewTextRankerAt("../dict.txt")
if err != nil {
	panic(err)
}

sentence := "此外，公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元，增资后，吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年，实现营业收入0万元，实现净利润-139.13万元。"

result := t.TextRank(sentence, 10)
for _, segment := range result {
	fmt.Printf("%s %f\n", segment.Text(), segment.Weight())
}

Output:

吉林 1.000000
欧亚 0.878078
置业 0.562048
实现 0.520906
收入 0.384284
增资 0.360591
子公司 0.353132
城市 0.307509
全资 0.306324
商业 0.306138

Index ¶

Variables
type Idf
- func NewIdf() *Idf
type Segment
- func (s Segment) Text() string
- func (s Segment) Weight() float64
type Segments
type StopWord
- func NewStopWord() *StopWord
type TagExtracter
type TextRanker
- func NewTextRanker(file io.Reader) (*TextRanker, error)
- func NewTextRankerAt(file string) (*TextRanker, error)
- func (t *TextRanker) TextRank(sentence string, topK int) Segments
- func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultStopWordMap = map[string]int{
	"the":   1,
	"of":    1,
	"is":    1,
	"and":   1,
	"to":    1,
	"in":    1,
	"that":  1,
	"we":    1,
	"for":   1,
	"an":    1,
	"are":   1,
	"by":    1,
	"be":    1,
	"as":    1,
	"on":    1,
	"with":  1,
	"can":   1,
	"if":    1,
	"from":  1,
	"which": 1,
	"you":   1,
	"it":    1,
	"this":  1,
	"then":  1,
	"at":    1,
	"have":  1,
	"all":   1,
	"not":   1,
	"one":   1,
	"has":   1,
	"or":    1,
}

DefaultStopWordMap contains some stop words.

Functions ¶

This section is empty.

Types ¶

type Idf ¶

type Idf struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

Idf represents a thread-safe dictionary for all words with their IDFs(Inverse Document Frequency).

func NewIdf ¶

func NewIdf() *Idf

NewIdf creates a new Idf instance.

func (*Idf) AddToken ¶

func (i *Idf) AddToken(token dictionary.Token)

AddToken adds a new word with IDF into it's dictionary.

func (*Idf) Frequency ¶

func (i *Idf) Frequency(key string) (float64, bool)

Frequency returns the IDF of given word.

func (*Idf) Load ¶

func (i *Idf) Load(tokens ...dictionary.Token)

Load loads all tokens into it's dictionary.

type Segment ¶

type Segment struct {
	// contains filtered or unexported fields
}

Segment represents a word with weight.

func (Segment) Text ¶

func (s Segment) Text() string

Text returns the segment's text.

func (Segment) Weight ¶

func (s Segment) Weight() float64

Weight returns the segment's weight.

type Segments ¶

type Segments []Segment

Segments represents a slice of Segment.

func (Segments) Len ¶

func (ss Segments) Len() int

func (Segments) Less ¶

func (ss Segments) Less(i, j int) bool

func (Segments) Swap ¶

func (ss Segments) Swap(i, j int)

type StopWord ¶

type StopWord struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

StopWord is a thread-safe dictionary for all stop words.

func NewStopWord ¶

func NewStopWord() *StopWord

NewStopWord create a new StopWord with default stop words.

func (*StopWord) AddToken ¶

func (s *StopWord) AddToken(token dictionary.Token)

AddToken adds a token into StopWord dictionary.

func (*StopWord) IsStopWord ¶

func (s *StopWord) IsStopWord(word string) bool

IsStopWord checks if a given word is stop word.

func (*StopWord) Load ¶

func (s *StopWord) Load(tokens ...dictionary.Token)

Load loads all tokens into StopWord dictionary.

type TagExtracter ¶

type TagExtracter struct {
	// contains filtered or unexported fields
}

TagExtracter is used to extract tags from sentence.

func (*TagExtracter) ExtractTags ¶

func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)

ExtractTags extracts the topK key words from sentence.

func (*TagExtracter) LoadDictionary ¶

func (t *TagExtracter) LoadDictionary(file io.Reader) (err error)

LoadDictionary reads the given filename and create a new dictionary.

func (*TagExtracter) LoadDictionaryAt ¶

func (t *TagExtracter) LoadDictionaryAt(file string) (err error)

LoadDictionaryAt reads the given filename and create a new dictionary.

func (*TagExtracter) LoadIdf ¶

func (t *TagExtracter) LoadIdf(file io.Reader) error

LoadIdf reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadIdfAt ¶

func (t *TagExtracter) LoadIdfAt(fileName string) error

LoadIdfAt reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadStopWords ¶

func (t *TagExtracter) LoadStopWords(file io.Reader) error

LoadStopWords reads the given file and create a new StopWord dictionary.

func (*TagExtracter) LoadStopWordsAt ¶

func (t *TagExtracter) LoadStopWordsAt(file string) error

LoadStopWordsAt reads the given file and create a new StopWord dictionary.

type TextRanker ¶

type TextRanker posseg.Segmenter

TextRanker is used to extract tags from sentence.

func NewTextRanker ¶

func NewTextRanker(file io.Reader) (*TextRanker, error)

NewTextRanker reads a given file and create a new dictionary file for Textranker.

func NewTextRankerAt ¶

func NewTextRankerAt(file string) (*TextRanker, error)

NewTextRankerAt reads a given file and create a new dictionary file for Textranker.

func (*TextRanker) TextRank ¶

func (t *TextRanker) TextRank(sentence string, topK int) Segments

TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS ¶

func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL