spell

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2024 License: BSD-3-Clause Imports: 17 Imported by: 1

README

spell

spell is a spell checking package, originally based on https://github.com/sajari/fuzzy

Documentation

Overview

Package spell provides functions for spell check and correction. It wraps https://github.com/sajari/fuzzy as the core spelling engine.

A single globally-usable spelling dictionary is managed.

Index

Constants

View Source
const (
	SpellDepthDefault              = 2
	SpellThresholdDefault          = 5
	SuffDivergenceThresholdDefault = 100
)
View Source
const (
	MethodIsWord                   Method = 0
	MethodSuggestMapsToInput              = 1
	MethodInputDeleteMapsToDict           = 2
	MethodInputDeleteMapsToSuggest        = 3
)
View Source
const SaveAfterLearnIntervalSecs = 20

SaveAfterLearnIntervalSecs is number of seconds since file has been opened / saved above which model is saved after learning.

Variables

View Source
var (
	Ignore = map[string]struct{}{}
)

Functions

func CheckIgnore

func CheckIgnore(word string) bool

CheckIgnore returns true if the word is found in the Ignore list

func CheckLexLine

func CheckLexLine(src []rune, tags lex.Line) lex.Line

CheckLexLine returns the Lex regions for any words that are misspelled within given line of text with existing Lex tags -- automatically excludes any Code token regions (see token.IsCode). Token is set to token.TextSpellErr on returned Lex's

func CheckWord

func CheckWord(word string) ([]string, bool)

CheckWord checks a single word and returns suggestions if word is unknown

func Complete

func Complete(s string) []string

Complete finds possible completions based on the prefix s

func Edits1

func Edits1(word string) []string

Edits1 creates a set of terms that are 1 char delete from the input term

func IgnoreWord

func IgnoreWord(word string)

IgnoreWord adds the word to the Ignore list

func Initialized

func Initialized() bool

Initialized returns true if the model has been loaded or created anew

func LearnWord

func LearnWord(word string)

LearnWord adds a single word to the corpus: this is deterministic and we set the threshold to 1 to make it learn it immediately.

func Levenshtein

func Levenshtein(a, b *string) int

Calculate the Levenshtein distance between two strings

func ModTime

func ModTime(path string) (time.Time, error)

ModTime returns the modification time of given file path

func Open

func Open(path string) error

Open loads the saved model stored in json format

func OpenCheck

func OpenCheck() error

OpenCheck checks if the current file has been modified since last open time and re-opens it if so -- call this prior to checking.

func OpenDefault

func OpenDefault() error

OpenDefault loads the default spelling file. TODO: need different languages obviously!

func OpenEmbed

func OpenEmbed(fname string) error

OpenEmbed loads json-formatted model from embedded data

func ResetLearnTime

func ResetLearnTime()

func SampleEnglish

func SampleEnglish() []string

func Save

func Save(filename string) error

Save saves the spelling model which includes the data and parameters note: this will overwrite any existing file -- be sure to have opened the current file before making any changes.

func SaveIfLearn

func SaveIfLearn() error

SaveIfLearn saves the spelling model to file path that was used in last Open command, if learning has occurred since last save / open. If no changes also checks if file has been modified and opens it if so.

func Train

func Train(file os.File, new bool) (err error)

Train trains the model based on a text file

func UnLearnWord

func UnLearnWord(word string)

UnLearnWord removes word from dictionary -- in case accidentally added

Types

type Autos

type Autos struct {
	Results []string
	Model   *Model
}

For sorting autocomplete suggestions to bias the most popular first

func (Autos) Len

func (a Autos) Len() int

func (Autos) Less

func (a Autos) Less(i, j int) bool

func (Autos) Swap

func (a Autos) Swap(i, j int)

type Counts

type Counts struct {
	Corpus int `json:"c"`
	Query  int `json:"q"`
}

Counts has the individual word counts

type Method

type Method int

func (Method) String

func (m Method) String() string

type Model

type Model struct {
	Data                    map[string]*Counts  `json:"data"`
	Maxcount                int                 `json:"maxcount"`
	Suggest                 map[string][]string `json:"suggest"`
	Depth                   int                 `json:"depth"`
	Threshold               int                 `json:"threshold"`
	UseAutocomplete         bool                `json:"autocomplete"`
	SuffDivergence          int                 `json:"-"`
	SuffDivergenceThreshold int                 `json:"suff_threshold"`
	SuffixArr               *suffixarray.Index  `json:"-"`
	SuffixArrConcat         string              `json:"-"`
	sync.RWMutex
}

Model is the full data model

func FromReader

func FromReader(r io.Reader) (*Model, error)

FromReader loads a model from a Reader

func Load

func Load(filename string) (*Model, error)

Load a saved model from disk

func NewModel

func NewModel() *Model

Create and initialise a new model

func (*Model) Autocomplete

func (md *Model) Autocomplete(input string) ([]string, error)

For a given string, autocomplete using the suffix array model

func (*Model) CheckKnown

func (md *Model) CheckKnown(input string, correct string) bool

Test an input, if we get it wrong, look at why it is wrong. This function returns a bool indicating if the guess was correct as well as the term it is suggesting. Typically this function would be used for testing, not for production

func (*Model) Delete

func (md *Model) Delete(term string)

Delete removes given word from dictionary -- undoes learning

func (*Model) EditsMulti

func (md *Model) EditsMulti(term string, depth int) []string

Edits at any depth for a given term. The depth of the model is used

func (*Model) Init

func (md *Model) Init() *Model

func (*Model) Potentials

func (md *Model) Potentials(input string, exhaustive bool) map[string]*Potential

Return the raw potential terms so they can be ranked externally to this package

func (*Model) Save

func (md *Model) Save(filename string) error

Save a spelling model to disk

func (*Model) SaveLight

func (md *Model) SaveLight(filename string) error

Save a spelling model to disk, but discard all entries less than the threshold number of occurrences Much smaller and all that is used when generated as a once off, but not useful for incremental usage

func (*Model) SetCount

func (md *Model) SetCount(term string, count int, suggest bool)

Manually set the count of a word. Optionally trigger the creation of suggestion keys for the term. This function lets you build a model from an existing dictionary with word popularity counts without needing to run "TrainWord" repeatedly

func (*Model) SetDepth

func (md *Model) SetDepth(val int)

Change the default depth value of the model. This sets how many character differences are indexed. The default is 2.

func (*Model) SetDivergenceThreshold

func (md *Model) SetDivergenceThreshold(val int)

Optionally set the suffix array divergence threshold. This is the number of query training steps between rebuilds of the suffix array. A low number will be more accurate but will use resources and create more garbage.

func (*Model) SetThreshold

func (md *Model) SetThreshold(val int)

Change the default threshold of the model. This is how many times a term must be seen before suggestions are created for it

func (*Model) SetUseAutocomplete

func (md *Model) SetUseAutocomplete(val bool)

Optionally disabled suffixarray based autocomplete support

func (*Model) SpellCheck

func (md *Model) SpellCheck(input string) string

Return the most likely correction for the input term

func (*Model) SpellCheckSuggestions

func (md *Model) SpellCheckSuggestions(input string, n int) []string

Return the most likely corrections in order from best to worst

func (*Model) Suggestions

func (md *Model) Suggestions(input string, exhaustive bool) []string

For a given input string, suggests potential replacements

func (*Model) Train

func (md *Model) Train(terms []string)

Add an array of words to train the model in bulk

func (*Model) TrainQuery

func (md *Model) TrainQuery(term string)

TrainQuery using a search query term. This builds a second popularity index of terms used to search, as opposed to generally occurring in corpus text

func (*Model) TrainWord

func (md *Model) TrainWord(term string)

Train the model word by word. This is corpus training as opposed to query training. Word counts from this type of training are not likely to correlate with those of search queries

func (*Model) WriteTo

func (md *Model) WriteTo(w io.Writer) error

WriteTo writes a model to a Writer

type Pair

type Pair struct {
	// contains filtered or unexported fields
}

type Potential

type Potential struct {
	Term   string // Potential term string
	Score  int    // Score
	Leven  int    // Levenstein distance from the suggestion to the input
	Method Method // How this potential was matched
}

Potential is a potential match

func (*Potential) String

func (pot *Potential) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL