dic

package
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2019 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package dic implements the dictionary of the morph analyzer.

Index

Constants

View Source
const (
	// IPADicPath represents the internal IPA dictionary path.
	IPADicPath = "dic/ipa"
)
View Source
const UserDicColumnSize = 4

UserDicColumnSize is the column size of the user dictionary.

Variables

This section is empty.

Functions

func NewContents

func NewContents(b []byte) [][]string

NewContents creates dictionary contents from byte slice

Types

type ConnectionTable

type ConnectionTable struct {
	Row, Col int64
	Vec      []int16
}

ConnectionTable represents a connection matrix of morphs.

func LoadConnectionTable

func LoadConnectionTable(r io.Reader) (t ConnectionTable, err error)

LoadConnectionTable loads ConnectionTable from io.Reader.

func (*ConnectionTable) At

func (t *ConnectionTable) At(row, col int) int16

At returns the connection cost of matrix[row, col].

func (ConnectionTable) WriteTo

func (t ConnectionTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Contents

type Contents [][]string

Contents represents dictionary contents.

func (Contents) WriteTo

func (c Contents) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Dic

type Dic struct {
	Morphs       []Morph
	POSTable     POSTable
	Contents     [][]string
	Connection   ConnectionTable
	Index        IndexTable
	CharClass    []string
	CharCategory []byte
	InvokeList   []bool
	GroupList    []bool
	UnkMorphs    []Morph
	UnkIndex     map[int32]int32
	UnkIndexDup  map[int32]int32
	UnkContents  [][]string
}

Dic represents a dictionary of a tokenizer.

func Load

func Load(path string) (d *Dic, err error)

Load loads a dictionary from a file.

func SysDic

func SysDic() *Dic

SysDic returns the kagome system dictionary.

func SysDicIPA

func SysDicIPA() *Dic

SysDicIPA returns the IPA system dictionary.

func SysDicIPASimple

func SysDicIPASimple() *Dic

SysDicIPASimple returns the IPA system dictionary without contents.

func SysDicSimple

func SysDicSimple() *Dic

SysDicSimple returns the kagome system dictionary without contents.

func (Dic) CharacterCategory

func (d Dic) CharacterCategory(r rune) byte

CharacterCategory returns the category of a rune.

type IndexTable

type IndexTable struct {
	Da  da.DoubleArray
	Dup map[int32]int32
}

IndexTable represents a dictionary index.

func BuildIndexTable

func BuildIndexTable(sortedKeywords []string) (IndexTable, error)

BuildIndexTable constructs a index table from keywords.

func ReadIndexTable

func ReadIndexTable(r io.Reader) (IndexTable, error)

ReadIndexTable loads a index table.

func (IndexTable) CommonPrefixSearch

func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)

CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.

func (IndexTable) CommonPrefixSearchCallback

func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))

CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.

func (IndexTable) Search

func (idx IndexTable) Search(input string) []int

Search finds the given keyword and returns the id if found.

func (IndexTable) WriteTo

func (idx IndexTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo saves a index table.

type Morph

type Morph struct {
	LeftID, RightID, Weight int16
}

Morph represents part of speeches and an occurrence cost.

func LoadMorphSlice

func LoadMorphSlice(r io.Reader) ([]Morph, error)

LoadMorphSlice loads morph data from io.Reader

type MorphSlice

type MorphSlice []Morph

MorphSlice represents a slice of morphs.

func (MorphSlice) WriteTo

func (m MorphSlice) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type POS

type POS []POSID

POS represents a vector of part of speech.

type POSID

type POSID int16

POSID represents a ID of part of speech.

type POSMap

type POSMap map[string]POSID

POSMap represents a part of speech control table.

func (POSMap) Add

func (p POSMap) Add(pos []string) POS

Add adds part of speech item to the POS control table and returns it's id.

func (POSMap) List

func (p POSMap) List() []string

List returns a list whose index is POS ID and value is its name.

type POSTable

type POSTable struct {
	POSs     []POS
	NameList []string
}

POSTable represents a table for managing part of speeches.

func ReadPOSTable

func ReadPOSTable(r io.Reader) (POSTable, error)

ReadPOSTable loads a POS table.

func (POSTable) WriteTo

func (p POSTable) WriteTo(w io.Writer) (int64, error)

WriteTo saves a POS table.

type Trie

type Trie interface {
	Search(input string) []int32
	PrefixSearch(input string) (length int, output []int32)
	CommonPrefixSearch(input string) (lens []int, outputs [][]int32)
	CommonPrefixSearchCallback(input string, callback func(id, l int))
}

Trie is an interface representing retrieval ability.

type UserDic

type UserDic struct {
	Index    IndexTable
	Contents []UserDicContent
}

UserDic represents a user dictionary.

func NewUserDic

func NewUserDic(path string) (udic *UserDic, err error)

NewUserDic build a user dictionary from a file.

type UserDicContent

type UserDicContent struct {
	Tokens []string
	Yomi   []string
	Pos    string
}

UserDicContent represents contents of a word in a user dictionary.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL