ptpp

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2020 License: MIT Imports: 10 Imported by: 0

README

Persian Text PreProcessor

Go Version License PkgGoDev

Persian Text PreProcessor is a tool to help search engines to improve their results. Although this library is especially optimized for Persian language, it can also be used for English or mixed-language texts.

Getting Started

Use go get to install PTPP library into your awesome project:

$ go get -u gopkg.in/ptpp.v1

PTPP uses a simple memory model which must be trained before being used. In order to train the model, use simple correctly-spelled words or phrases:

var processor ptpp.Processor
processor.Train([]string{
    "bass guitar",
    "garbage collector",
    ...
})

Now, the trained processor is able to auto-correct misspelled words and form composite keywords from search phrases:

processor.Process(strings.NewReader("electric base guitarr"))
// Returns: []string{"electric", "bass guitar"}

License

PTPP is published under MIT license.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Levenshtein

func Levenshtein(v, w string) int

Levenshtein computes the Levenshtein distance for two words.

Types

type DefaultSemanticMatcher

type DefaultSemanticMatcher struct {
	// contains filtered or unexported fields
}

DefaultSemanticMatcher is a SemanticMatcher which uses a simple lookup table to find the best suggestion.

func (*DefaultSemanticMatcher) Load added in v1.1.0

func (sm *DefaultSemanticMatcher) Load(r io.Reader) error

Load restores the state of the semantic-matcher from r.

func (*DefaultSemanticMatcher) Match

func (sm *DefaultSemanticMatcher) Match(context string, suggestions []string) (string, bool)

Match finds the best suggestion based on the context.

func (*DefaultSemanticMatcher) Save added in v1.1.0

func (sm *DefaultSemanticMatcher) Save(w io.Writer) error

Save stores the state of the semantic-matcher into w.

func (*DefaultSemanticMatcher) Train

func (sm *DefaultSemanticMatcher) Train(context, word string)

Train trains the semantic models with a word and its context.

type DefaultSpellChecker

type DefaultSpellChecker struct {
	// contains filtered or unexported fields
}

DefaultSpellChecker is a SpellChecker which uses a distance model to find suggestions for a misspelled word.

func (*DefaultSpellChecker) Check

func (sc *DefaultSpellChecker) Check(word string) []string

Check finds correct spell suggestions for a word.

func (*DefaultSpellChecker) Load added in v1.1.0

func (sc *DefaultSpellChecker) Load(r io.Reader) error

Load restores the state of the spell-checker from r.

func (*DefaultSpellChecker) Save added in v1.1.0

func (sc *DefaultSpellChecker) Save(w io.Writer) error

Save stores the state of the spell-checker into w.

func (*DefaultSpellChecker) Train

func (sc *DefaultSpellChecker) Train(words []string)

Train trains the suggestion model with a list of words.

type LoadSaver added in v1.1.0

type LoadSaver interface {

	// Load restores the state of the object from r.
	Load(r io.Reader) error

	// Save stores the state of the object into w.
	Save(w io.Writer) error
}

LoadSaver denotes an object that can store and restore its state.

type Processor

type Processor struct {

	// SpellChecker is the word spell-checker. If this field is nil, the
	// preprocessor will use DefaultSpellChecker.
	SpellChecker SpellChecker

	// SemanticMatcher is the semantic matcher to find the best suggestion. If
	// this field is nil, the preprocessor will use DefaultSemanticMatcher.
	SemanticMatcher SemanticMatcher
	// contains filtered or unexported fields
}

Processor is the Persian text preprocessor.

func (*Processor) Load added in v1.1.0

func (p *Processor) Load(filePath string) error

Load restores the state of the processor from filePath.

func (*Processor) Process

func (p *Processor) Process(r io.Reader) ([]string, error)

Process does the preprocessing on an input and extracts phrases.

func (*Processor) Save added in v1.1.0

func (p *Processor) Save(filePath string) (err error)

Save stores the state of the processor into w.

func (*Processor) Train

func (p *Processor) Train(phrases []string)

Train trains the preprocessing model with a list of phrases.

type SemanticMatcher

type SemanticMatcher interface {

	// Match finds the best suggestion based on the context. If the best
	// suggestion correlates with the context, it returns true for matched,
	// otherwise it returns false.
	Match(context string, suggestions []string) (best string, matched bool)

	// Train trains the semantic models with a word and its context.
	Train(context, word string)
}

SemanticMatcher provides selection of the best suggestion by its context.

type SpellChecker

type SpellChecker interface {

	// Check finds correct spell suggestions for a word. The resulted suggestion
	// list should not be empty.
	Check(word string) []string

	// Train trains the suggestion model with a list of words.
	Train(words []string)
}

SpellChecker is a word spell-checker.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL