Documentation ¶
Overview ¶
Package tokenizer is a japanese morphological analyzer library.
Example (Tokenize_mode) ¶
d, err := dict.LoadDictFile(testDictPath) if err != nil { panic(err) } t, err := New(d) if err != nil { panic(err) } for _, mode := range []TokenizeMode{Normal, Search, Extended} { tokens := t.Analyze("関西国際空港", Normal) fmt.Printf("---%s---", mode) for _, token := range tokens { if token.Class == DUMMY { // BOS: Begin Of Sentence, EOS: End Of Sentence. fmt.Printf("%s\n", token.Surface) continue } features := strings.Join(token.Features(), ",") fmt.Printf("%s\t%v\n", token.Surface, features) } }
Output:
Index ¶
- Constants
- type Option
- type Token
- func (t Token) BaseForm() (string, bool)
- func (t Token) FeatureAt(i int) (string, bool)
- func (t Token) Features() []string
- func (t Token) InflectionalForm() (string, bool)
- func (t Token) InflectionalType() (string, bool)
- func (t Token) POS() []string
- func (t Token) Pronunciation() (string, bool)
- func (t Token) Reading() (string, bool)
- func (t Token) String() string
- type TokenClass
- type TokenizeMode
- type Tokenizer
- func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
- func (t Tokenizer) AnalyzeGraph(w io.Writer, input string, mode TokenizeMode) (tokens []Token)
- func (t Tokenizer) Dot(w io.Writer, input string) (tokens []Token)
- func (t Tokenizer) Tokenize(input string) []Token
- func (t Tokenizer) Wakati(input string) []string
Examples ¶
Constants ¶
const ( // DUMMY represents the dummy token. DUMMY = TokenClass(lattice.DUMMY) // KNOWN represents the token in the dictionary. KNOWN = TokenClass(lattice.KNOWN) // UNKNOWN represents the token which is not in the dictionary. UNKNOWN = TokenClass(lattice.UNKNOWN) // USER represents the token in the user dictionary. USER = TokenClass(lattice.USER) )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Token ¶
type Token struct { ID int Class TokenClass Start int End int Surface string // contains filtered or unexported fields }
Token represents a morph of a sentence.
func (Token) InflectionalForm ¶
InflectionalForm returns the inflectional form feature if exists.
func (Token) InflectionalType ¶
InflectionalType returns the inflectional type feature if exists.
func (Token) Pronunciation ¶
Pronunciation returns the pronunciation feature if exists.
type TokenClass ¶
TokenClass represents the token class.
func (TokenClass) String ¶
func (c TokenClass) String() string
String returns string representation of a token class.
type TokenizeMode ¶
type TokenizeMode int
TokenizeMode represents a mode of tokenize.
const ( // Segmentation mode for search // Kagome has segmentation mode for search such as Kuromoji. // Normal: Regular segmentation // Search: Use a heuristic to do additional segmentation useful for search // Extended: Similar to search mode, but also unigram unknown words // // Normal is the normal tokenize mode. Normal TokenizeMode = iota + 1 // Search is the tokenize mode for search. Search // Extended is the experimental tokenize mode. Extended // BosEosID means the beginning a sentence or the end of a sentence. BosEosID = lattice.BosEosID )
func (TokenizeMode) String ¶
func (m TokenizeMode) String() string
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer represents morphological analyzer.
func (Tokenizer) Analyze ¶
func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)
Analyze tokenizes a sentence in the specified mode.
func (Tokenizer) AnalyzeGraph ¶
AnalyzeGraph returns morphs of a sentence and exports a lattice graph to dot format.
func (Tokenizer) Dot ¶
Dot returns morphs of a sentence and exports a lattice graph to dot format in standard tokenize mode.