tokenizer

package
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2016 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package tokenizer is a japanese morphological analyzer library.

Index

Constants

View Source
const (
	// DUMMY represents the dummy token.
	DUMMY = TokenClass(lattice.DUMMY)
	// KNOWN represents the token in the dictionary.
	KNOWN = TokenClass(lattice.KNOWN)
	// UNKNOWN represents the token which is not in the dictionary.
	UNKNOWN = TokenClass(lattice.UNKNOWN)
	// USER represents the token in the user dictionary.
	USER = TokenClass(lattice.USER)
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Dic added in v1.0.0

type Dic struct {
	// contains filtered or unexported fields
}

Dic represents a dictionary.

func NewDic added in v1.0.0

func NewDic(path string) (Dic, error)

NewDic loads a dictionary from a file.

func SysDic added in v1.0.0

func SysDic() Dic

SysDic returns the system dictionary (IPA dictionary).

func SysDicIPA added in v1.3.0

func SysDicIPA() Dic

SysDicIPA returns the IPA dictionary as the system dictionary.

func SysDicUni added in v1.3.0

func SysDicUni() Dic

SysDicUni returns the UniDic dictionary as the system dictionary.

type Token added in v1.0.0

type Token struct {
	ID      int
	Class   TokenClass
	Start   int
	End     int
	Surface string
	// contains filtered or unexported fields
}

Token represents a morph of a sentence.

func (Token) Features added in v1.0.0

func (t Token) Features() (features []string)

Features returns contents of a token.

func (Token) String added in v1.0.0

func (t Token) String() string

String returns a string representation of a token.

type TokenClass added in v1.0.0

type TokenClass lattice.NodeClass

TokenClass represents the token type.

func (TokenClass) String added in v1.0.0

func (c TokenClass) String() string

type TokenizeMode added in v1.0.0

type TokenizeMode int

TokenizeMode represents a mode of tokenize.

const (

	// Normal is the normal tokenize mode.
	Normal TokenizeMode = iota + 1
	// Search is the tokenize mode for search.
	Search
	// Extended is the experimental tokenize mode.
	Extended
	// BosEosID means the begining a sentence or the end of a sentence.
	BosEosID = lattice.BosEosID
)

type Tokenizer added in v0.0.2

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer represents morphological analyzer.

func New added in v1.0.0

func New() (t Tokenizer)

New create a default tokenize.

func NewWithDic added in v1.0.0

func NewWithDic(d Dic) (t Tokenizer)

NewWithDic create a tokenizer with specified dictionary.

func (Tokenizer) Analyze added in v1.0.0

func (t Tokenizer) Analyze(input string, mode TokenizeMode) (tokens []Token)

Analyze tokenizes a sentence in the specified mode.

func (Tokenizer) AnalyzeGraph added in v1.2.0

func (t Tokenizer) AnalyzeGraph(input string, mode TokenizeMode, w io.Writer) (tokens []Token)

GraphView returns morphs of a sentense and exports a lattice graph to dot format.

func (Tokenizer) Dot added in v1.0.0

func (t Tokenizer) Dot(input string, w io.Writer) (tokens []Token)

Dot returns morphs of a sentense and exports a lattice graph to dot format in standard tokenize mode.

func (*Tokenizer) SetDic added in v1.0.0

func (t *Tokenizer) SetDic(d Dic)

SetDic sets dictionary to dic.

func (*Tokenizer) SetUserDic added in v0.0.2

func (t *Tokenizer) SetUserDic(d UserDic)

SetUserDic sets user dictionary to udic.

func (Tokenizer) Tokenize added in v0.0.2

func (t Tokenizer) Tokenize(input string) []Token

Tokenize analyze a sentence in standard tokenize mode.

type UserDic added in v1.0.0

type UserDic struct {
	// contains filtered or unexported fields
}

UserDic represents a user dictionary.

func NewUserDic added in v1.0.0

func NewUserDic(path string) (UserDic, error)

NewUserDic build a user dictionary from a file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL