ngram

package module
v0.0.0-...-80eaf16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2016 License: MIT Imports: 6 Imported by: 1

README

go-ngram Build Status

N-gram index for Go.

Key features

  • Unicode support.
  • Append only. Data can't be deleted from index.
  • GC friendly (all strings are pooled and compressed)
  • Application agnostic (there is no notion of document or something that user needs to implement)

Usage

index, err := ngram.NewNGramIndex(ngram.SetN(3))
tokenId, err := index.Add("hello") 
str, err := index.GetString(tokenId)  // str == "hello"
resultsList, err := index.Search("world")

TODO:

  • Smoothing functions (Laplace etc)

GoDoc

docs examples

library users

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type NGramIndex

type NGramIndex struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

NGramIndex can be initialized by default (zeroed) or created with "NewNgramIndex"

func NewNGramIndex

func NewNGramIndex(opts ...Option) (*NGramIndex, error)

NewNGramIndex is N-gram index c-tor. In most cases must be used withot parameters. You can pass parameters to c-tor using functions SetPad, SetWarp and SetN.

func (*NGramIndex) Add

func (ngram *NGramIndex) Add(input string) (TokenID, error)

Add token to index. Function returns token id, this id can be converted to string with function "GetString".

func (*NGramIndex) BestMatch

func (ngram *NGramIndex) BestMatch(input string, threshold ...float64) (*SearchResult, error)

BestMatch is the same as Search except that it's returning only one best result instead of all.

func (*NGramIndex) GetString

func (ngram *NGramIndex) GetString(id TokenID) (string, error)

GetString converts token-id to string.

func (*NGramIndex) Search

func (ngram *NGramIndex) Search(input string, threshold ...float64) ([]SearchResult, error)

Search for matches between query string (input) and indexed strings. First parameter - threshold is optional and can be used to set minimal similarity between input string and matching string. You can pass only one threshold value. Results is an unordered array of 'SearchResult' structs. This struct contains similarity value (float32 value from threshold to 1.0) and token-id.

type Option

type Option func(*NGramIndex) error

func SetN

func SetN(n int) Option

SetN must be used to pass N (gram size) to NGramIndex c-tor

func SetPad

func SetPad(c rune) Option

SetPad must be used to pass padding character to NGramIndex c-tor

func SetWarp

func SetWarp(warp float64) Option

SetWarp must be used to pass warp to NGramIndex c-tor

type SearchResult

type SearchResult struct {
	TokenID    TokenID
	Similarity float64
}

SearchResult contains token id and similarity - value in range from 0.0 to 1.0

type TokenID

type TokenID int

TokenID is just id of the token

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL