darts

package module
v0.0.0-...-1b63d94 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 24, 2019 License: Apache-2.0 Imports: 8 Imported by: 2

README

Darts

This is a GO implementation of Double-ARray Trie System. It's a clone of the C++ version

Darts can be used as simple hash dictionary. You can also do very fast Common Prefix Search which is essential for morphological analysis, such as word split for CJK text indexing/searching.

Reference

What is Trie
An Implementation of Double-Array Trie

NEWS

  • Support building Double-Array from DAWG有向无环图, reduce the on-disk dict half as Trie. Lookup performance increases 25%.

TO DO list

  • Documentation/comments
  • Benchmark

Switch from unicode to byte version

gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w darts.go
gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w dawg.go

Usage

Input Dictionary Format

Key\tFreq

Each key occupies one line. The file should be utf-8 encoded

Code example (unicode version)

package main

import (
    "darts"
    "fmt"
)

func main() {
    d, err:= darts.Import("darts.txt", "darts.lib", true)
    if err == nil {
        if d.ExactMatchSearch([]rune("考察队员", 0)) {
            fmt.Println("考察队员 is in dictionary")
        }
    }
}

Code example (byte version)

package main

import (
    "darts"
    "fmt"
)

func main() {
    d, err := darts.Import("darts.txt", "darts.lib", true)
    if err == nil {
        key := []byte("考察队员")
        r := d.CommonPrefixSearch(key, 0)
        for i := 0; i < len(r); i++ {
            fmt.Println(string(key[:r[i].PrefixLen]))
        }
    }
}

Performance

Using a 100K item dictionary, a simple search on eath key takes go map 46 ms, takes byte_key version of darts 14 ms, and for unicode_key version of darts 9.5 ms.

LICENSE

Apache License 2.0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Darts

type Darts struct {
	Base      []int
	Check     []int
	ValuePool []Value
}

func Build

func Build(key [][]rune, freq []int) Darts

variable key should be sorted ascendingly

func BuildFromDAWG

func BuildFromDAWG(keys [][]rune, freq []int) Darts

func Import

func Import(inFile, outFile string, useDAWG bool) (Darts, error)

func Load

func Load(filename string) (Darts, error)

func (Darts) CommonPrefixSearch

func (d Darts) CommonPrefixSearch(key []rune, nodePos int) (results []ResultPair)

func (Darts) ExactMatchSearch

func (d Darts) ExactMatchSearch(key []rune, nodePos int) bool

func (Darts) UpdateThesaurus

func (d Darts) UpdateThesaurus(keys [][]rune)

type Pair

type Pair struct {
	Char rune /*Key_type*/
	// contains filtered or unexported fields
}

type PairList

type PairList []Pair

A slice of Pairs that implements sort.Interface to sort by Value.

func (PairList) Len

func (p PairList) Len() int

func (PairList) Less

func (p PairList) Less(i, j int) bool

func (PairList) Swap

func (p PairList) Swap(i, j int)

type ResultPair

type ResultPair struct {
	PrefixLen int
	Value
}

type SubWord

type SubWord struct {
	OffSet int
	Len    int
}

type Value

type Value struct {
	Freq     int
	SubWords []SubWord
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL