Documentation ¶
Overview ¶
Package dic implements the dictionary of the morph analyzer.
Copyright 2018 ikawaha ¶
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Index ¶
Constants ¶
const ( // IPADicPath represents the internal IPA dictionary path. IPADicPath = "dic/ipa/ipa.dic" // UniDicPath represents the internal UniDic dictionary path. UniDicPath = "dic/uni/uni.dic" )
const UserDicColumnSize = 4
UserDicColumnSize is the column size of the user dictionary.
Variables ¶
This section is empty.
Functions ¶
func NewContents ¶ added in v1.3.0
NewContents creates dictionary contents from byte slice
Types ¶
type ConnectionTable ¶
ConnectionTable represents a connection matrix of morphs.
func LoadConnectionTable ¶
func LoadConnectionTable(r io.Reader) (t ConnectionTable, err error)
LoadConnectionTable loads ConnectionTable from io.Reader.
func (*ConnectionTable) At ¶
func (t *ConnectionTable) At(row, col int) int16
At returns the connection cost of matrix[row, col].
type Dic ¶
type Dic struct { Morphs []Morph POSTable POSTable Contents [][]string Connection ConnectionTable Index IndexTable CharClass []string CharCategory []byte InvokeList []bool GroupList []bool UnkDic }
Dic represents a dictionary of a tokenizer.
func LoadSimple ¶ added in v1.7.1
LoadSimple loads a dictionary from a file without contents.
func SysDicIPASimple ¶ added in v1.7.0
func SysDicIPASimple() *Dic
SysDicIPASimple returns the IPA system dictionary without contents.
func SysDicSimple ¶ added in v1.7.0
func SysDicSimple() *Dic
SysDicSimple returns the kagome system dictionary without contents.
func SysDicUni ¶ added in v1.3.0
func SysDicUni() *Dic
SysDicUni returns the UniDic system dictionary.
func SysDicUniSimple ¶ added in v1.7.0
func SysDicUniSimple() *Dic
SysDicUniSimple returns the IPA system dictionary without contents.
func (Dic) CharacterCategory ¶ added in v1.4.0
CharacterCategory returns the category of a rune.
type IndexTable ¶
type IndexTable struct { Da da.DoubleArray Dup map[int32]int32 }
IndexTable represents a dictionary index.
func BuildIndexTable ¶
func BuildIndexTable(sortedKeywords []string) (IndexTable, error)
BuildIndexTable constructs a index table from keywords.
func ReadIndexTable ¶
func ReadIndexTable(r io.Reader) (IndexTable, error)
ReadIndexTable loads a index table.
func (IndexTable) CommonPrefixSearch ¶
func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)
CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.
func (IndexTable) CommonPrefixSearchCallback ¶ added in v1.5.1
func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))
CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.
func (IndexTable) Search ¶
func (idx IndexTable) Search(input string) []int
Search finds the given keyword and returns the id if found.
type Morph ¶
type Morph struct {
LeftID, RightID, Weight int16
}
Morph represents part of speeches and an occurrence cost.
type POSMap ¶ added in v1.7.0
POSMap represents a part of speech control table.
type POSTable ¶ added in v1.7.0
POSTable represents a table for managing part of speeches.
func ReadPOSTable ¶ added in v1.7.0
ReadPOSTable loads a POS table.
func (POSTable) GetPOSName ¶ added in v1.7.0
GetPOSName returns a vector of part of speech name.
type Trie ¶
type Trie interface { Search(input string) []int32 PrefixSearch(input string) (length int, output []int32) CommonPrefixSearch(input string) (lens []int, outputs [][]int32) CommonPrefixSearchCallback(input string, callback func(id, l int)) }
Trie is an interface representing retrieval ability.
type UnkDic ¶ added in v1.7.1
type UnkDic struct { UnkMorphs []Morph UnkIndex map[int32]int32 UnkIndexDup map[int32]int32 UnkContents [][]string }
UnkDic represents an unknown word dictionary part.
func ReadUnkDic ¶ added in v1.7.1
ReadUnkDic loads an unknown word dictionary.
type UserDic ¶
type UserDic struct { Index IndexTable Contents []UserDicContent }
UserDic represents a user dictionary.
func NewUserDic ¶
NewUserDic build a user dictionary from a file.
type UserDicContent ¶
UserDicContent represents contents of a word in a user dictionary.