Documentation ¶
Overview ¶
Package tokenizer is a japanese morphological analyzer library.
Index ¶
- Constants
- type Dic
- type Token
- type TokenClass
- type TokenizeMode
- type Tokenizer
- func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
- func (t Tokenizer) AnalyzeGraph(input string, mode TokenizeMode, w io.Writer) (tokens []Token)
- func (t Tokenizer) Dot(input string, w io.Writer) (tokens []Token)
- func (t *Tokenizer) SetDic(d Dic)
- func (t *Tokenizer) SetUserDic(d UserDic)
- func (t Tokenizer) Tokenize(input string) []Token
- type UserDic
- type UserDicRecord
- type UserDicRecords
Constants ¶
const ( // DUMMY represents the dummy token. DUMMY = TokenClass(lattice.DUMMY) // KNOWN represents the token in the dictionary. KNOWN = TokenClass(lattice.KNOWN) // UNKNOWN represents the token which is not in the dictionary. UNKNOWN = TokenClass(lattice.UNKNOWN) // USER represents the token in the user dictionary. USER = TokenClass(lattice.USER) )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dic ¶ added in v1.0.0
type Dic struct {
// contains filtered or unexported fields
}
Dic represents a dictionary.
func SysDic ¶ added in v1.0.0
func SysDic() Dic
SysDic returns the system dictionary (IPA dictionary).
type Token ¶ added in v1.0.0
type Token struct { ID int Class TokenClass Start int End int Surface string // contains filtered or unexported fields }
Token represents a morph of a sentence.
type TokenClass ¶ added in v1.0.0
TokenClass represents the token type.
func (TokenClass) String ¶ added in v1.0.0
func (c TokenClass) String() string
type TokenizeMode ¶ added in v1.0.0
type TokenizeMode int
TokenizeMode represents a mode of tokenize.
const ( // Normal is the normal tokenize mode. Normal TokenizeMode = iota + 1 // Search is the tokenize mode for search. Search // Extended is the experimental tokenize mode. Extended // BosEosID means the beginning a sentence or the end of a sentence. BosEosID = lattice.BosEosID )
type Tokenizer ¶ added in v0.0.2
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer represents morphological analyzer.
func NewWithDic ¶ added in v1.0.0
NewWithDic create a tokenizer with specified dictionary.
func (Tokenizer) Analyze ¶ added in v1.0.0
func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
Analyze tokenizes a sentence in the specified mode.
func (Tokenizer) AnalyzeGraph ¶ added in v1.2.0
AnalyzeGraph returns morphs of a sentence and exports a lattice graph to dot format.
func (Tokenizer) Dot ¶ added in v1.0.0
Dot returns morphs of a sentence and exports a lattice graph to dot format in standard tokenize mode.
func (*Tokenizer) SetUserDic ¶ added in v0.0.2
SetUserDic sets user dictionary to udic.
type UserDic ¶ added in v1.0.0
type UserDic struct {
// contains filtered or unexported fields
}
UserDic represents a user dictionary.
func NewUserDic ¶ added in v1.0.0
NewUserDic build a user dictionary from a file.
type UserDicRecord ¶ added in v1.5.0
type UserDicRecord struct { Text string `json:"text"` Tokens []string `json:"tokens"` Yomi []string `json:"yomi"` Pos string `json:"pos"` }
UserDicRecord represents a record of the user dictionary file format.
type UserDicRecords ¶ added in v1.5.0
type UserDicRecords []UserDicRecord
UserDicRecords represents user dictionary data.
func NewUserDicRecords ¶ added in v1.5.0
func NewUserDicRecords(r io.Reader) (UserDicRecords, error)
NewUserDicRecords loads user dictionary data from io.Reader.
func (UserDicRecords) Len ¶ added in v1.5.0
func (u UserDicRecords) Len() int
func (UserDicRecords) Less ¶ added in v1.5.0
func (u UserDicRecords) Less(i, j int) bool
func (UserDicRecords) NewUserDic ¶ added in v1.5.0
func (u UserDicRecords) NewUserDic() (UserDic, error)
NewUserDic builds a user dictionary.
func (UserDicRecords) Swap ¶ added in v1.5.0
func (u UserDicRecords) Swap(i, j int)