Documentation
¶
Overview ¶
Example ¶
Output: Default Mode: Term: 永和 Start: 0 End: 6 Position: 1 Type: 1 Term: 服装 Start: 6 End: 12 Position: 2 Type: 1 Term: 饰品 Start: 12 End: 18 Position: 3 Type: 1 Term: 有限公司 Start: 18 End: 30 Position: 4 Type: 1 Search Mode: Term: 永和 Start: 0 End: 6 Position: 1 Type: 1 Term: 服装 Start: 6 End: 12 Position: 2 Type: 1 Term: 饰品 Start: 12 End: 18 Position: 3 Type: 1 Term: 有限 Start: 18 End: 24 Position: 4 Type: 1 Term: 公司 Start: 24 End: 30 Position: 5 Type: 1 Term: 有限公司 Start: 18 End: 30 Position: 6 Type: 1
Index ¶
Examples ¶
Constants ¶
View Source
const Name = "jieba"
Name is the jieba tokenizer name.
Variables ¶
This section is empty.
Functions ¶
func JiebaTokenizerConstructor ¶
func JiebaTokenizerConstructor(config map[string]interface{}, cache *registry.Cache) ( analysis.Tokenizer, error)
JiebaTokenizerConstructor creates a JiebaTokenizer.
Parameter config should contains at least one parameter:
file: the path of the dictionary file. hmm: optional, specify whether to use Hidden Markov Model, see NewJiebaTokenizer for details. search: optional, speficy whether to use search mode, see NewJiebaTokenizer for details.
func NewJiebaTokenizer ¶
NewJiebaTokenizer creates a new JiebaTokenizer.
Parameters:
dictFilePath: path of the dictioanry file. hmm: whether to use Hidden Markov Model to cut unknown words, i.e. not found in dictionary. For example word "安卓" (means "Android" in English) not in the dictionary file. If hmm is set to false, it will be cutted into two single words "安" and "卓", if hmm is set to true, it will be traded as one single word because Jieba using Hidden Markov Model with Viterbi algorithm to guess the best possibility. searchMode: whether to further cut long words into serveral short words. In Chinese, some long words may contains other words, for example "交换机" is a Chinese word for "Switcher", if sechMode is false, it will trade "交换机" as a single word. If searchMode is true, it will further split this word into "交换", "换机", which are valid Chinese words.
Types ¶
type JiebaTokenizer ¶
type JiebaTokenizer struct {
// contains filtered or unexported fields
}
JiebaTokenizer is the beleve tokenizer for jiebago.
func (*JiebaTokenizer) Tokenize ¶
func (jt *JiebaTokenizer) Tokenize(input []byte) analysis.TokenStream
Tokenize cuts input into bleve token stream.
Click to show internal directories.
Click to hide internal directories.