Documentation ¶
Overview ¶
Package tokenizer is a japanese morphological analyzer library.
Index ¶
- Constants
- type Dic
- type Token
- type TokenClass
- type TokenizeMode
- type Tokenizer
- func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
- func (t Tokenizer) AnalyzeGraph(input string, mode TokenizeMode, w io.Writer) (tokens []Token)
- func (t Tokenizer) Dot(input string, w io.Writer) (tokens []Token)
- func (t *Tokenizer) SetDic(d Dic)
- func (t *Tokenizer) SetUserDic(d UserDic)
- func (t Tokenizer) Tokenize(input string) []Token
- type UserDic
Constants ¶
View Source
const ( // DUMMY represents the dummy token. DUMMY = TokenClass(lattice.DUMMY) // KNOWN represents the token in the dictionary. KNOWN = TokenClass(lattice.KNOWN) // UNKNOWN represents the token which is not in the dictionary. UNKNOWN = TokenClass(lattice.UNKNOWN) // USER represents the token in the user dictionary. USER = TokenClass(lattice.USER) )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dic ¶ added in v1.0.0
type Dic struct {
// contains filtered or unexported fields
}
Dic represents a dictionary.
func SysDic ¶ added in v1.0.0
func SysDic() Dic
SysDic returns the system dictionary (IPA dictionary).
type Token ¶ added in v1.0.0
type Token struct { ID int Class TokenClass Start int End int Surface string // contains filtered or unexported fields }
Token represents a morph of a sentence.
type TokenClass ¶ added in v1.0.0
TokenClass represents the token type.
func (TokenClass) String ¶ added in v1.0.0
func (c TokenClass) String() string
type TokenizeMode ¶ added in v1.0.0
type TokenizeMode int
TokenizeMode represents a mode of tokenize.
const ( // Normal is the normal tokenize mode. Normal TokenizeMode = iota + 1 // Search is the tokenize mode for search. Search // Extended is the experimental tokenize mode. Extended // BosEosID means the begining a sentence or the end of a sentence. BosEosID = lattice.BosEosID )
type Tokenizer ¶ added in v0.0.2
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer represents morphological analyzer.
func NewWithDic ¶ added in v1.0.0
NewWithDic create a tokenizer with specified dictionary.
func (Tokenizer) Analyze ¶ added in v1.0.0
func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
Analyze tokenizes a sentence in the specified mode.
func (Tokenizer) AnalyzeGraph ¶ added in v1.2.0
GraphView returns morphs of a sentense and exports a lattice graph to dot format.
func (Tokenizer) Dot ¶ added in v1.0.0
Dot returns morphs of a sentense and exports a lattice graph to dot format in standard tokenize mode.
func (*Tokenizer) SetUserDic ¶ added in v0.0.2
SetUserDic sets user dictionary to udic.
Click to show internal directories.
Click to hide internal directories.