segmenter

package

v0.0.0-...-5e73d17 Latest Latest Go to latest Published: Feb 24, 2016 License: GPL-2.0 Imports: 8 Imported by: 2

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

View Source

const (
	MinTokenFrequency = 2 // 仅从字典文件中读取大于等于此频率的分词
)

分词器接口

This section is empty.

This section is empty.

type ChinaCut struct {
	// contains filtered or unexported fields
}

分词器结构体

func InitChinaCut(files string) *ChinaCut

func (self *ChinaCut) Cut(bytes []byte, model bool) []search.Segment

对文本分词输入参数：

bytes	UTF8文本的字节数组

输出：

[]Segment	划分的分词

func (self *ChinaCut) Dictionary() *search.Dictionary

返回分词器使用的词典

func (self *ChinaCut) LoadDictionary(files string)

从文件中载入词典可以载入多个词典文件，文件名用","分隔，排在前面的词典优先载入分词，比如

"用户词典.txt,通用词典.txt"

当一个分词既出现在用户词典也出现在通用词典中，则优先使用用户词典。词典的格式为（每个分词一行）：

分词文本 频率 词性