sentiment

package module

v0.0.0-...-c697f64 Latest Latest Go to latest Published: Jun 17, 2020 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/PtNan/sentiment

README ¶

Sentiment

Simple, Drop In Sentiment Analysis in Golang

This package relies on the work done in my other package, goml, for multiclass text classification

Sentiment lets you pass strings into a function and get an estimate of the sentiment of the string (in english) using a very simple probabalistic model. The model is trained off of this dataset which is a collection of IMDB movie reviews classified by sentiment. The returned values for single word classification is the given score in {0,1}/{negative/positive} for sentiment as well as the probability on [0,1] that the word is of the expected class. For document sentiment only the class is given (floats would underflow otherwise.)

Implemented Languages

If you want to implement another language, open an issue or email me. It really is not hard (if you have a dataset.)

English
- dataset: IMDB Reviews

Model

Sentiment uses a Naive Bayes classification model for prediction. There are plusses and minuses, but Naive bayes tends to do well for text classification.

Example

You can save the model trained off of the dataset to a json file using the PersistToFile(filepath string) error function so you don't have to run the training again, though it only takes about 4 seconds max.

Training, or Restoring a Pre-Trained Model:

// Train is used within the library, but you should
// usually prefer Restore because it's faster and
// you don't have to be in the project's directory
//
// model, err := sentiment.Train()

model, err := sentiment.Restore()
if err != nil {
    panic(fmt.Sprintf("Could not restore model!\n\t%v\n", err))
}

Analysis:

// get sentiment analysis summary
// in any implemented language
analysis = model.SentimentAnalysis("You're mother is an awful lady", sentiment.English) // 0

LICENSE - MIT

Documentation ¶

Index ¶

Constants
func Asset(name string) ([]byte, error)
func AssetDir(name string) ([]string, error)
func AssetInfo(name string) (os.FileInfo, error)
func AssetNames() []string
func MustAsset(name string) []byte
func PersistToFile(m Models, path string) error
func RestoreAsset(dir, name string) error
func RestoreAssets(dir, name string) error
func SplitSentences(r rune) bool
func TrainEnglishModel(modelMap Models) error
type Analysis
type Language
type Models
- func (m Models) SentimentAnalysis(sentence string, lang Language) *Analysis
type Score
type SentenceScore

Constants ¶

View Source

const (
	English            Language = "en"
	Spanish                     = "es"
	French                      = "fr"
	German                      = "de"
	Italian                     = "it"
	Arabic                      = "ar"
	Japanese                    = "ja"
	Indonesian                  = "id"
	Portugese                   = "pt"
	Korean                      = "ko"
	Turkish                     = "tr"
	Russian                     = "ru"
	Dutch                       = "nl"
	Filipino                    = "fil"
	Malay                       = "msa"
	ChineseTraditional          = "zh-tw"
	ChineseSimplified           = "zh-cn"
	Hindi                       = "hi"
	Norwegian                   = "no"
	Swedish                     = "sv"
	Finnish                     = "fi"
	Danish                      = "da"
	Polish                      = "pl"
	Hungarian                   = "hu"
	Farsi                       = "fa"
	Hebrew                      = "he"
	Urdu                        = "ur"
	Thai                        = "th"
	NoLanguage                  = ""
)

Constants hold the Twitter language codes that will correspond to models. Obviously all of these won't be used initially, but they're here for ease of extention. US English is being lumped with UK English.

View Source

const (
	// TempDirectory is the default temporary
	// directory for persisting models to disk
	TempDirectory string = "/tmp/.sentiment"
)

Variables ¶

This section is empty.

Functions ¶

func Asset ¶

func Asset(name string) ([]byte, error)

Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetDir ¶

func AssetDir(name string) ([]string, error)

AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:

data/
  foo.txt
  img/
    a.png
    b.png

then AssetDir("data") would return []string{"foo.txt", "img"} AssetDir("data/img") would return []string{"a.png", "b.png"} AssetDir("foo.txt") and AssetDir("notexist") would return an error AssetDir("") will return []string{"data"}.

func AssetInfo ¶

func AssetInfo(name string) (os.FileInfo, error)

AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetNames ¶

func AssetNames() []string

AssetNames returns the names of the assets.

func MustAsset ¶

func MustAsset(name string) []byte

MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.

func PersistToFile ¶

func PersistToFile(m Models, path string) error

PersistToFile persists a Models struct to a filepath, returning any errors

func RestoreAsset ¶

func RestoreAsset(dir, name string) error

RestoreAsset restores an asset under the given directory

func RestoreAssets ¶

func RestoreAssets(dir, name string) error

RestoreAssets restores an asset under the given directory recursively

func SplitSentences ¶

func SplitSentences(r rune) bool

SplitSentences takes in a rune r and returns whether the rune is a sentence delimiter ('.', '?', or '!').

It satisfies the interface for strings.FieldsFunc()

func TrainEnglishModel ¶

func TrainEnglishModel(modelMap Models) error

TrainEnglishModel takes in a path to the expected IMDB datasets, and a map of models to add the model to. It'll return any errors if there were any.

Types ¶

type Analysis ¶

type Analysis struct {
	Language  Language        `json:"lang"`
	Words     []Score         `json:"words"`
	Sentences []SentenceScore `json:"sentences,omitempty"`
	Score     uint8           `json:"score"`
}

Analysis returns the analysis of a document, splitting it into total sentiment, individual sentence sentiment, and individual word sentiment, along with the language code

type Language ¶

type Language string

Language is a language code used for differentiating sentiment models

type Models ¶

type Models map[Language]*text.NaiveBayes

Models holds a map from language keys to sentiment classifiers.

func Restore ¶

func Restore() (Models, error)

Restore restores a pre-trained models from a binary asset this is the preferable method of generating a model (use it unless you want to train the model again)

This basically wraps RestoreModels.

func RestoreModels ¶

func RestoreModels(bytes []byte) (Models, error)

RestoreModels takes in a byte of a (presumably) map[Language]LanguageModel and marshals it into a usable model that you can use to run regular, language specific sentiment analysis

func Train ¶

func Train() (Models, error)

Train takes in a directory path to persist the model to, trains the model, and saves the model to the given file. After this is run you can run the SentimentXXX functions effectively.

Note that this must be run from within the project directory! To just get the model without re-training you should just call "Resore"

func (Models) SentimentAnalysis ¶

func (m Models) SentimentAnalysis(sentence string, lang Language) *Analysis

SentimentAnalysis takes in a (possibly 'dirty') sentence (or any block of text,) cleans the text, finds the sentiment of each word in the text, finds the sentiment of the sentence as a whole, adn returns an Analysis struct

type Score ¶

type Score struct {
	Word  string `json:"word"`
	Score uint8  `json:"score"`
}

Score holds the score of a singular word (differs from SentenceScore only in param names and JSON marshaling, not actualy types)

type SentenceScore ¶

type SentenceScore struct {
	Sentence string `json:"sentence"`
	Score    uint8  `json:"score"`
}

SentenceScore holds the score of a document, which could be (and probably is) a sentence

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL