dic

package
v1.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2016 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Package dic implements the dictionary of the morph analyzer.

Index

Constants

View Source
const (
	// IPADicPath represents the internal IPA dictionary path.
	IPADicPath = "dic/ipa"
	// UniDicPath represents the internal UniDic dictionary path.
	UniDicPath = "dic/uni"
)
View Source
const UserDicColumnSize = 4

UserDicColumnSize is the column size of the user dictionary.

Variables

This section is empty.

Functions

func NewContents added in v1.3.0

func NewContents(b []byte) [][]string

NewContents creates dictionary contents from byte slice

Types

type ConnectionTable

type ConnectionTable struct {
	Row, Col int64
	Vec      []int16
}

ConnectionTable represents a connection matrix of morphs.

func LoadConnectionTable

func LoadConnectionTable(r io.Reader) (t ConnectionTable, err error)

LoadConnectionTable loads ConnectionTable from io.Reader.

func (*ConnectionTable) At

func (t *ConnectionTable) At(row, col int) int16

At returns the connection cost of matrix[row, col].

func (ConnectionTable) WriteTo

func (t ConnectionTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Contents added in v1.3.0

type Contents [][]string

Contents represents dictionary contents.

func (Contents) WriteTo added in v1.3.0

func (c Contents) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Dic

type Dic struct {
	Morphs       []Morph
	Contents     [][]string
	Connection   ConnectionTable
	Index        IndexTable
	CharClass    []string
	CharCategory []byte
	InvokeList   []bool
	GroupList    []bool
	UnkMorphs    []Morph
	UnkIndex     map[int32]int32
	UnkIndexDup  map[int32]int32
	UnkContents  [][]string
}

Dic represents a dictionary of a tokenizer.

func Load

func Load(path string) (d *Dic, err error)

Load loads a dictionary from a file.

func SysDic

func SysDic() *Dic

SysDic returns the kagome system dictionary.

func SysDicIPA

func SysDicIPA() *Dic

SysDicIPA returns the IPA system dictionary.

func SysDicUni added in v1.3.0

func SysDicUni() *Dic

SysDicUni returns the UniDic system dictionary.

func (Dic) CharacterCategory added in v1.4.0

func (d Dic) CharacterCategory(r rune) byte

CharacterCategory returns the category of a rune.

type IndexTable

type IndexTable struct {
	Da  da.DoubleArray
	Dup map[int32]int32
}

IndexTable represents a dictionary index.

func BuildIndexTable

func BuildIndexTable(sortedKeywords []string) (IndexTable, error)

BuildIndexTable constructs a index table from keywords.

func ReadIndexTable

func ReadIndexTable(r io.Reader) (IndexTable, error)

ReadIndexTable loads a index table.

func (IndexTable) CommonPrefixSearch

func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)

CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.

func (IndexTable) CommonPrefixSearchCallback added in v1.5.1

func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))

CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.

func (IndexTable) Search

func (idx IndexTable) Search(input string) []int

Search finds the given keyword and returns the id if found.

func (IndexTable) WriteTo

func (idx IndexTable) WriteTo(w io.Writer) (n int64, err error)

WriteTo saves a index table.

type Morph

type Morph struct {
	LeftID, RightID, Weight int16
}

Morph represents part of speeches and an occurrence cost.

func LoadMorphSlice

func LoadMorphSlice(r io.Reader) ([]Morph, error)

LoadMorphSlice loads morph data from io.Reader

type MorphSlice

type MorphSlice []Morph

MorphSlice represents a slice of morphs.

func (MorphSlice) WriteTo

func (m MorphSlice) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface

type Trie

type Trie interface {
	Search(input string) []int32
	PrefixSearch(input string) (length int, output []int32)
	CommonPrefixSearch(input string) (lens []int, outputs [][]int32)
	CommonPrefixSearchCallback(input string, callback func(id, l int))
}

Trie is an interface representing retrieval ability.

type UserDic

type UserDic struct {
	Index    IndexTable
	Contents []UserDicContent
}

UserDic represents a user dictionary.

func NewUserDic

func NewUserDic(path string) (udic *UserDic, err error)

NewUserDic build a user dictionary from a file.

type UserDicContent

type UserDicContent struct {
	Tokens []string
	Yomi   []string
	Pos    string
}

UserDicContent represents contents of a word in a user dictionary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL