Documentation ¶
Overview ¶
Package dict implements the dictionary of the morph analyzer.
Index ¶
- Constants
- func NewContents(b []byte) [][]string
- type CharCategory
- type CharClass
- type CharDef
- type ConnectionTable
- type Contents
- type ContentsMeta
- type Dict
- type GroupList
- type IndexTable
- type Info
- type InvokeList
- type Morph
- type Morphs
- type POS
- type POSID
- type POSMap
- type POSTable
- type SizeReaderAt
- type UnkDict
- type UserDicRecord
- type UserDict
- type UserDictContent
- type UserDictRecords
Constants ¶
const ( POSStartIndex = "_pos_start" POSHierarchy = "_pos_hierarchy" InflectionalType = "_inflectional_type" InflectionalForm = "_inflectional_form" BaseFormIndex = "_base" ReadingIndex = "_reading" PronunciationIndex = "_pronunciation" )
const ( // MorphDictFileName is the default file name of a morph dict. MorphDictFileName = "morph.dict" // POSDictFileName is the default file name of a part of speech dict. POSDictFileName = "pos.dict" // ContentMetaFileName is the default file name of content meta. ContentMetaFileName = "content.meta" // ContentDictFileName is the default file name of a content dict. ContentDictFileName = "content.dict" // IndexDictFileName is the default filename of a dictionary index. IndexDictFileName = "index.dict" // ConnectionDictFileName is the default filename of a connection dict. ConnectionDictFileName = "connection.dict" // CharDefDictFileName is the default filename of a char def. CharDefDictFileName = "chardef.dict" // UnkDictFileName is the default filename of an unknown dict. UnkDictFileName = "unk.dict" // DictInfoFileName is the file name of a dictionary info. DictInfoFileName = "dict.info" )
const UserDictColumnSize = 4
UserDictColumnSize is the column size of the user dictionary.
Variables ¶
This section is empty.
Functions ¶
func NewContents ¶
NewContents creates dictionary contents from byte slice.
Types ¶
type CharDef ¶
type CharDef struct { CharClass CharClass CharCategory CharCategory InvokeList InvokeList GroupList GroupList }
CharDef represents char.def.
func ReadCharDef ¶
ReadCharDef reads char.def format.
type ConnectionTable ¶
ConnectionTable represents a connection matrix of morphs.
func ReadConnectionTable ¶
func ReadConnectionTable(r io.Reader) (ConnectionTable, error)
ReadConnectionTable loads ConnectionTable from io.Reader.
func (*ConnectionTable) At ¶
func (t *ConnectionTable) At(row, col int) int16
At returns the connection cost of matrix[row, col].
type Contents ¶
type Contents [][]string
Contents represents dictionary contents.
func ReadContents ¶
ReadContents reads dictionary contents from io.Reader.
type ContentsMeta ¶
ContentsMeta represents the contents record information.
func ReadContentsMeta ¶
func ReadContentsMeta(r io.Reader) (ContentsMeta, error)
type Dict ¶
type Dict struct { Morphs Morphs POSTable POSTable ContentsMeta ContentsMeta Contents Contents Connection ConnectionTable Index IndexTable CharClass CharClass CharCategory CharCategory InvokeList InvokeList GroupList GroupList UnkDict UnkDict // contains filtered or unexported fields }
Dict represents a dictionary of a tokenizer.
func LoadDictFile ¶
LoadDictFile loads a dictionary from a file.
func LoadShrink ¶
LoadShrink loads a dictionary from a file without contents.
func (*Dict) CharacterCategory ¶
CharacterCategory returns the category of a rune.
type GroupList ¶
type GroupList []bool
GroupList represents whether to make a new word by grouping the same character category.
type IndexTable ¶
type IndexTable struct { Da trie.DoubleArray Dup map[int32]int32 }
IndexTable represents a dictionary index.
func BuildIndexTable ¶
func BuildIndexTable(sortedKeywords []string) (IndexTable, error)
BuildIndexTable constructs a index table from keywords.
func ReadIndexTable ¶
func ReadIndexTable(r io.Reader) (IndexTable, error)
ReadIndexTable loads a index table.
func (IndexTable) CommonPrefixSearch ¶
func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)
CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.
func (IndexTable) CommonPrefixSearchCallback ¶
func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))
CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.
func (IndexTable) Search ¶
func (idx IndexTable) Search(input string) []int
Search finds the given keyword and returns the id if found.
type Info ¶ added in v1.1.0
Info represents the dictionary info.
func ReadDictInfo ¶ added in v1.1.0
ReadDictInfo reads gob encoded dictionary info and returns it.
For backward compatibility, if a dictionary name is not defined or empty, it returns UndefinedDictName.
type InvokeList ¶
type InvokeList []bool
InvokeList represents whether to invoke unknown word processing.
type Morph ¶
type Morph struct {
LeftID, RightID, Weight int16
}
Morph represents part of speeches and an occurrence cost.
type Morphs ¶
type Morphs []Morph
Morphs represents a slice of morphs.
func ReadMorphs ¶
ReadMorphs loads morph data from io.Reader.
type POSMap ¶
POSMap represents a part of speech control table.
type POSTable ¶
POSTable represents a table for managing part of speeches.
func ReadPOSTable ¶
ReadPOSTable loads a POS table.
type SizeReaderAt ¶
SizeReaderAt is the interface that wraps the Size and ReadAt method.
func MultiSizeReaderAt ¶
func MultiSizeReaderAt(rs ...SizeReaderAt) SizeReaderAt
MultiSizeReaderAt returns a SizeReaderAt that is the logical concatenation of the provided input readers.
type UnkDict ¶
type UnkDict struct { Morphs Morphs Index map[int32]int32 IndexDup map[int32]int32 ContentsMeta ContentsMeta Contents Contents }
UnkDict represents an unknown word dictionary part.
func ReadUnkDic ¶
ReadUnkDic loads an unknown word dictionary.
type UserDicRecord ¶
type UserDicRecord struct { Text string `json:"text"` Tokens []string `json:"tokens"` Yomi []string `json:"yomi"` Pos string `json:"pos"` }
UserDicRecord represents a record of the user dictionary file format.
type UserDict ¶
type UserDict struct { Index IndexTable Contents []UserDictContent }
UserDict represents a user dictionary.
func NewUserDict ¶
NewUserDict build a user dictionary from a file.
type UserDictContent ¶
UserDictContent represents contents of a word in a user dictionary.
type UserDictRecords ¶
type UserDictRecords []UserDicRecord
UserDictRecords represents user dictionary data.
func NewUserDicRecords ¶
func NewUserDicRecords(r io.Reader) (UserDictRecords, error)
NewUserDicRecords loads user dictionary data from io.Reader.
func (UserDictRecords) Len ¶
func (u UserDictRecords) Len() int
func (UserDictRecords) Less ¶
func (u UserDictRecords) Less(i, j int) bool
func (UserDictRecords) NewUserDict ¶
func (u UserDictRecords) NewUserDict() (*UserDict, error)
NewUserDict builds a user dictionary.
func (UserDictRecords) Swap ¶
func (u UserDictRecords) Swap(i, j int)