nlp

package module
v0.0.0-...-39fec05 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2017 License: MIT Imports: 10 Imported by: 0

README

GoDoc Go Report Card Build Status codecov

nlp

nlp is a general purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

Supported types

int  int8  int16  int32  int64
uint uint8 uint16 uint32 uint64
float32 float64
string
time.Time
time.Duration

Installation

// go1.8+ is required
go get -u github.com/shixzie/nlp

Feel free to create PR's and open Issues :)

How it works

You will always begin by creating a NL type calling nlp.New(), the NL type is a Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().

RegisterModel(i interface{}, samples []string, ops ...ModelOption) error

RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.

The empty struct lets nlp know all possible values inside the text, for example:

type Song struct {
	Name        string // fields must be exported
	Artist      string
	ReleasedAt  time.Time
}
err := nl.RegisterModel(Song{}, someSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}
// ...

tells nlp that inside the text may be a Song.Name, a Song.Artist and a Song.ReleasedAt.

The samples are the key part about nlp, not just because they set the limits between keywords but also because they will be used to choose which model use to handle an expression.

Samples must have a special syntax to set those limits and keywords.

songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}

In the example below, you can see we're reffering to the Name and Artist fields of the Song type declared above, both {Name} and {Artist} are our keywords and yes! you guessed it! Everything between play and by will be treated as a {Name}, and everything that's after by will be treated as an {Artist} meaning that play and by are our limits.

     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords

Any character can be a limit, a , for example can be used as a limit.

keywords as well as limits are CaseSensitive so be sure to type them right.

Note that putting 2 keywords together will cause that only 1 or none of them will be detected

limits are important - Me :3

Learn() error

Learn maps all models samples to their respective models using the NaiveBayes algorithm based on those samples. Learn() also trains all registered models so they're able to fit expressions in the future.

// must call after all models are registrated and before calling nl.P()
err := nl.Learn() 
if err != nil {
	panic(err)
}
// ...

Once the algorithm has finished learning, we're now ready to start Processing those texts.

Note that you must call NL.Learn() after all models are registrated and before calling NL.P()

P(expr string) interface{}

P first asks the trained algorithm which model should be used, once we get the right and already trained model, we just make it fit the expression.

Note that everything in the expression must be separated by a space or tab

When processing an expression, nlp searches for the limits inside that expression and evaluates which sample fits better the expression, it doesn't matter if the text has trash. In this example:

     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords

we have 2 limits, play and by, it doesn't matter if we had an expression hello sir can you pleeeeeease play King by Lauren Aquilina, since:

                                  limits
            trash              ┌────┴────┐
┌─────────────┴─────────────┐ ┌┴─┐      ┌┴┐
hello sir can you pleeeeeease play King by  Lauren Aquilina
                                   └┬─┘     └─────┬───────┘
                                 {Name}       {Artist}
                                 └─┬──┘       └───┬──┘
                                   └──────┬───────┘
                                       keywords

{Name} would be replaced with King, {Artist} would be replaced with Lauren Aquilina, trash would be ignored as well as the limits play and by, and then a pointer to a filled struct with the type used to register the model (Song) ( Song.Name being {Name} and Song.Artist beign {Artist} ) will be returned.

Usage

type Song struct {
	Name       string
	Artist     string
	ReleasedAt time.Time
}

songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}

nl := nlp.New()
err := nl.RegisterModel(Song{}, songSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}

err = nl.Learn() // you must call Learn after all models are registered and before calling P
if err != nil {
	panic(err)
}

// after learning you can call P the times you want
s := nl.P("hello sir can you pleeeeeease play King by Lauren Aquilina") 
if song, ok := s.(*Song); ok {
	fmt.Println("Success")
	fmt.Printf("%#v\n", song)
} else {
	fmt.Println("Failed")
}

// Prints
//
// Success
// &main.Song{Name: "King", Artist: "Lauren Aquilina"}

Documentation

Overview

Package nlp provides general purpose Natural Language Processing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ModelOption

type ModelOption func(*model) error

ModelOption is an option for a specific model

func WithTimeFormat

func WithTimeFormat(format string) ModelOption

WithTimeFormat sets the format used in time.Parse(format, val), note that format can't contain any spaces, the default is 01-02-2006_3:04pm

func WithTimeLocation

func WithTimeLocation(loc *time.Location) ModelOption

WithTimeLocation sets the location used in time.ParseInLocation(format, value, loc), the default is time.Local

type NL

type NL struct {

	// Output contains the training output for the
	// NaiveBayes algorithm
	Output *bytes.Buffer
	// contains filtered or unexported fields
}

NL is a Natural Language Processor

func New

func New() *NL

New returns a *NL

func (*NL) Learn

func (nl *NL) Learn() error

Learn maps the models samples to the models themselves and returns an error if something occurred while learning

func (*NL) P

func (nl *NL) P(expr string) interface{}

P proccesses the expr and returns one of the types passed as the i parameter to the RegistryModel func filled with the data inside expr

func (*NL) RegisterModel

func (nl *NL) RegisterModel(i interface{}, samples []string, ops ...ModelOption) error

RegisterModel registers a model i and creates possible patterns from samples, the default layout when parsing time is 01-02-2006_3:04pm and the default location is time.Local. Samples must have special formatting:

"play {Name} by {Artist}"

Directories

Path Synopsis
Package parser contains the sample parser for nlp
Package parser contains the sample parser for nlp

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL