Documentation ¶
Overview ¶
Package analyse is the Golang implementation of Jieba's analyse module.
Example (ExtractTags) ¶
var t TagExtracter t.LoadDictionaryAt("../dict.txt") t.LoadIdfAt("idf.txt") sentence := "这是一个伸手不见五指的黑夜。我叫孙悟空,我爱北京,我爱Python和C++。" segments := t.ExtractTags(sentence, 5) fmt.Printf("Top %d tags:", len(segments)) for _, segment := range segments { fmt.Printf(" %s /", segment.Text()) }
Output: Top 5 tags: Python / C++ / 伸手不见五指 / 孙悟空 / 黑夜 /
Example (TextRank) ¶
t, err := NewTextRankerAt("../dict.txt") if err != nil { panic(err) } sentence := "此外,公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元,增资后,吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年,实现营业收入0万元,实现净利润-139.13万元。" result := t.TextRank(sentence, 10) for _, segment := range result { fmt.Printf("%s %f\n", segment.Text(), segment.Weight()) }
Output: 吉林 1.000000 欧亚 0.878078 置业 0.562048 实现 0.520906 收入 0.384284 增资 0.360591 子公司 0.353132 城市 0.307509 全资 0.306324 商业 0.306138
Index ¶
- Variables
- type Idf
- type Segment
- type Segments
- type StopWord
- type TagExtracter
- func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)
- func (t *TagExtracter) LoadDictionary(file io.Reader) (err error)
- func (t *TagExtracter) LoadDictionaryAt(file string) (err error)
- func (t *TagExtracter) LoadIdf(file io.Reader) error
- func (t *TagExtracter) LoadIdfAt(fileName string) error
- func (t *TagExtracter) LoadStopWords(file io.Reader) error
- func (t *TagExtracter) LoadStopWordsAt(file string) error
- type TextRanker
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var DefaultStopWordMap = map[string]int{
"the": 1,
"of": 1,
"is": 1,
"and": 1,
"to": 1,
"in": 1,
"that": 1,
"we": 1,
"for": 1,
"an": 1,
"are": 1,
"by": 1,
"be": 1,
"as": 1,
"on": 1,
"with": 1,
"can": 1,
"if": 1,
"from": 1,
"which": 1,
"you": 1,
"it": 1,
"this": 1,
"then": 1,
"at": 1,
"have": 1,
"all": 1,
"not": 1,
"one": 1,
"has": 1,
"or": 1,
}
DefaultStopWordMap contains some stop words.
Functions ¶
This section is empty.
Types ¶
type Idf ¶
Idf represents a thread-safe dictionary for all words with their IDFs(Inverse Document Frequency).
func (*Idf) AddToken ¶
func (i *Idf) AddToken(token dictionary.Token)
AddToken adds a new word with IDF into it's dictionary.
func (*Idf) Load ¶
func (i *Idf) Load(tokens ...dictionary.Token)
Load loads all tokens into it's dictionary.
type Segment ¶
type Segment struct {
// contains filtered or unexported fields
}
Segment represents a word with weight.
type StopWord ¶
StopWord is a thread-safe dictionary for all stop words.
func NewStopWord ¶
func NewStopWord() *StopWord
NewStopWord create a new StopWord with default stop words.
func (*StopWord) AddToken ¶
func (s *StopWord) AddToken(token dictionary.Token)
AddToken adds a token into StopWord dictionary.
func (*StopWord) IsStopWord ¶
IsStopWord checks if a given word is stop word.
func (*StopWord) Load ¶
func (s *StopWord) Load(tokens ...dictionary.Token)
Load loads all tokens into StopWord dictionary.
type TagExtracter ¶
type TagExtracter struct {
// contains filtered or unexported fields
}
TagExtracter is used to extract tags from sentence.
func (*TagExtracter) ExtractTags ¶
func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)
ExtractTags extracts the topK key words from sentence.
func (*TagExtracter) LoadDictionary ¶
func (t *TagExtracter) LoadDictionary(file io.Reader) (err error)
LoadDictionary reads the given filename and create a new dictionary.
func (*TagExtracter) LoadDictionaryAt ¶
func (t *TagExtracter) LoadDictionaryAt(file string) (err error)
LoadDictionaryAt reads the given filename and create a new dictionary.
func (*TagExtracter) LoadIdf ¶
func (t *TagExtracter) LoadIdf(file io.Reader) error
LoadIdf reads the given file and create a new Idf dictionary.
func (*TagExtracter) LoadIdfAt ¶
func (t *TagExtracter) LoadIdfAt(fileName string) error
LoadIdfAt reads the given file and create a new Idf dictionary.
func (*TagExtracter) LoadStopWords ¶
func (t *TagExtracter) LoadStopWords(file io.Reader) error
LoadStopWords reads the given file and create a new StopWord dictionary.
func (*TagExtracter) LoadStopWordsAt ¶
func (t *TagExtracter) LoadStopWordsAt(file string) error
LoadStopWordsAt reads the given file and create a new StopWord dictionary.
type TextRanker ¶
TextRanker is used to extract tags from sentence.
func NewTextRanker ¶
func NewTextRanker(file io.Reader) (*TextRanker, error)
NewTextRanker reads a given file and create a new dictionary file for Textranker.
func NewTextRankerAt ¶
func NewTextRankerAt(file string) (*TextRanker, error)
NewTextRankerAt reads a given file and create a new dictionary file for Textranker.
func (*TextRanker) TextRank ¶
func (t *TextRanker) TextRank(sentence string, topK int) Segments
TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.
func (*TextRanker) TextRankWithPOS ¶
func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments
TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.