dic

package
v1.7.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 5, 2018 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package dic implements the dictionary of the morph analyzer.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Index

Constants

View Source
const (
	// IPADicPath represents the internal IPA dictionary path.
	IPADicPath = "dic/ipa/ipa.dic"
	// UniDicPath represents the internal UniDic dictionary path.
	UniDicPath = "dic/uni/uni.dic"
)
View Source
const UserDicColumnSize = 4

UserDicColumnSize is the column size of the user dictionary.

Variables

This section is empty.

Functions

func NewContents added in v1.3.0

func NewContents(b []byte) [][]string

NewContents creates dictionary contents from byte slice

Types

type ConnectionTable

type ConnectionTable struct {
	Row, Col int64
	Vec      []int16
}

ConnectionTable represents a connection matrix of morphs.

func LoadConnectionTable

func LoadConnectionTable(r io.Reader) (t ConnectionTable, err error)

LoadConnectionTable loads ConnectionTable from io.Reader.

func (*ConnectionTable) At

func (t *ConnectionTable) At(row, col int) int16

At returns the connection cost of matrix[row, col].

func (ConnectionTable) WriteTo

func (t ConnectionTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Contents added in v1.3.0

type Contents [][]string

Contents represents dictionary contents.

func (Contents) WriteTo added in v1.3.0

func (c Contents) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Dic

type Dic struct {
	Morphs       []Morph
	POSTable     POSTable
	Contents     [][]string
	Connection   ConnectionTable
	Index        IndexTable
	CharClass    []string
	CharCategory []byte
	InvokeList   []bool
	GroupList    []bool

	UnkDic
}

Dic represents a dictionary of a tokenizer.

func Load

func Load(path string) (d *Dic, err error)

Load loads a dictionary from a file.

func LoadSimple added in v1.7.1

func LoadSimple(path string) (d *Dic, err error)

LoadSimple loads a dictionary from a file without contents.

func SysDic

func SysDic() *Dic

SysDic returns the kagome system dictionary.

func SysDicIPA

func SysDicIPA() *Dic

SysDicIPA returns the IPA system dictionary.

func SysDicIPASimple added in v1.7.0

func SysDicIPASimple() *Dic

SysDicIPASimple returns the IPA system dictionary without contents.

func SysDicSimple added in v1.7.0

func SysDicSimple() *Dic

SysDicSimple returns the kagome system dictionary without contents.

func SysDicUni added in v1.3.0

func SysDicUni() *Dic

SysDicUni returns the UniDic system dictionary.

func SysDicUniSimple added in v1.7.0

func SysDicUniSimple() *Dic

SysDicUniSimple returns the IPA system dictionary without contents.

func (Dic) CharacterCategory added in v1.4.0

func (d Dic) CharacterCategory(r rune) byte

CharacterCategory returns the category of a rune.

type IndexTable

type IndexTable struct {
	Da  da.DoubleArray
	Dup map[int32]int32
}

IndexTable represents a dictionary index.

func BuildIndexTable

func BuildIndexTable(sortedKeywords []string) (IndexTable, error)

BuildIndexTable constructs a index table from keywords.

func ReadIndexTable

func ReadIndexTable(r io.Reader) (IndexTable, error)

ReadIndexTable loads a index table.

func (IndexTable) CommonPrefixSearch

func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)

CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.

func (IndexTable) CommonPrefixSearchCallback added in v1.5.1

func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))

CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.

func (IndexTable) Search

func (idx IndexTable) Search(input string) []int

Search finds the given keyword and returns the id if found.

func (IndexTable) WriteTo

func (idx IndexTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo saves a index table.

type Morph

type Morph struct {
	LeftID, RightID, Weight int16
}

Morph represents part of speeches and an occurrence cost.

func LoadMorphSlice

func LoadMorphSlice(r io.Reader) ([]Morph, error)

LoadMorphSlice loads morph data from io.Reader

type MorphSlice

type MorphSlice []Morph

MorphSlice represents a slice of morphs.

func (MorphSlice) WriteTo

func (m MorphSlice) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type POS added in v1.7.0

type POS []POSID

POS represents a vector of part of speech.

type POSID added in v1.7.0

type POSID int16

POSID represents a ID of part of speech.

type POSMap added in v1.7.0

type POSMap map[string]POSID

POSMap represents a part of speech control table.

func (POSMap) Add added in v1.7.0

func (p POSMap) Add(pos []string) POS

Add adds part of speech item to the POS control table and returns it's id.

func (POSMap) List added in v1.7.0

func (p POSMap) List() []string

List returns a list whose index is POS ID and value is its name.

type POSTable added in v1.7.0

type POSTable struct {
	POSs     []POS
	NameList []string
}

POSTable represents a table for managing part of speeches.

func ReadPOSTable added in v1.7.0

func ReadPOSTable(r io.Reader) (POSTable, error)

ReadPOSTable loads a POS table.

func (POSTable) GetPOSName added in v1.7.0

func (p POSTable) GetPOSName(pos POS) []string

GetPOSName returns a vector of part of speech name.

func (POSTable) WriteTo added in v1.7.0

func (p POSTable) WriteTo(w io.Writer) (int64, error)

WriteTo saves a POS table.

type Trie

type Trie interface {
	Search(input string) []int32
	PrefixSearch(input string) (length int, output []int32)
	CommonPrefixSearch(input string) (lens []int, outputs [][]int32)
	CommonPrefixSearchCallback(input string, callback func(id, l int))
}

Trie is an interface representing retrieval ability.

type UnkDic added in v1.7.1

type UnkDic struct {
	UnkMorphs   []Morph
	UnkIndex    map[int32]int32
	UnkIndexDup map[int32]int32
	UnkContents [][]string
}

UnkDic represents an unknown word dictionary part.

func ReadUnkDic added in v1.7.1

func ReadUnkDic(r io.Reader) (UnkDic, error)

ReadUnkDic loads an unknown word dictionary.

func (UnkDic) WriteTo added in v1.7.1

func (u UnkDic) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface.

type UserDic

type UserDic struct {
	Index    IndexTable
	Contents []UserDicContent
}

UserDic represents a user dictionary.

func NewUserDic

func NewUserDic(path string) (udic *UserDic, err error)

NewUserDic build a user dictionary from a file.

type UserDicContent

type UserDicContent struct {
	Tokens []string
	Yomi   []string
	Pos    string
}

UserDicContent represents contents of a word in a user dictionary.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL