Documentation ¶
Index ¶
- Constants
- func Asset(name string) ([]byte, error)
- func AssetDir(name string) ([]string, error)
- func AssetInfo(name string) (os.FileInfo, error)
- func AssetNames() []string
- func MustAsset(name string) []byte
- func PersistToFile(m Models, path string) error
- func RestoreAsset(dir, name string) error
- func RestoreAssets(dir, name string) error
- func SplitSentences(r rune) bool
- func TrainEnglishModel(modelMap Models) error
- type Analysis
- type Language
- type Models
- type Score
- type SentenceScore
Constants ¶
const ( English Language = "en" Spanish = "es" French = "fr" German = "de" Italian = "it" Arabic = "ar" Japanese = "ja" Indonesian = "id" Portugese = "pt" Korean = "ko" Turkish = "tr" Russian = "ru" Dutch = "nl" Filipino = "fil" Malay = "msa" ChineseTraditional = "zh-tw" ChineseSimplified = "zh-cn" Hindi = "hi" Norwegian = "no" Swedish = "sv" Finnish = "fi" Danish = "da" Polish = "pl" Hungarian = "hu" Farsi = "fa" Hebrew = "he" Urdu = "ur" Thai = "th" NoLanguage = "" )
Constants hold the Twitter language codes that will correspond to models. Obviously all of these won't be used initially, but they're here for ease of extention. US English is being lumped with UK English.
const ( // TempDirectory is the default temporary // directory for persisting models to disk TempDirectory string = "/tmp/.sentiment" )
Variables ¶
This section is empty.
Functions ¶
func Asset ¶
Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.
func AssetDir ¶
AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:
data/ foo.txt img/ a.png b.png
then AssetDir("data") would return []string{"foo.txt", "img"} AssetDir("data/img") would return []string{"a.png", "b.png"} AssetDir("foo.txt") and AssetDir("notexist") would return an error AssetDir("") will return []string{"data"}.
func AssetInfo ¶
AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.
func MustAsset ¶
MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.
func PersistToFile ¶
PersistToFile persists a Models struct to a filepath, returning any errors
func RestoreAsset ¶
RestoreAsset restores an asset under the given directory
func RestoreAssets ¶
RestoreAssets restores an asset under the given directory recursively
func SplitSentences ¶
SplitSentences takes in a rune r and returns whether the rune is a sentence delimiter ('.', '?', or '!').
It satisfies the interface for strings.FieldsFunc()
func TrainEnglishModel ¶
TrainEnglishModel takes in a path to the expected IMDB datasets, and a map of models to add the model to. It'll return any errors if there were any.
Types ¶
type Analysis ¶
type Analysis struct { Language Language `json:"lang"` Words []Score `json:"words"` Sentences []SentenceScore `json:"sentences,omitempty"` Score uint8 `json:"score"` }
Analysis returns the analysis of a document, splitting it into total sentiment, individual sentence sentiment, and individual word sentiment, along with the language code
type Language ¶
type Language string
Language is a language code used for differentiating sentiment models
type Models ¶
type Models map[Language]*text.NaiveBayes
Models holds a map from language keys to sentiment classifiers.
func Restore ¶
Restore restores a pre-trained models from a binary asset this is the preferable method of generating a model (use it unless you want to train the model again)
This basically wraps RestoreModels.
func RestoreModels ¶
RestoreModels takes in a byte of a (presumably) map[Language]LanguageModel and marshals it into a usable model that you can use to run regular, language specific sentiment analysis
func Train ¶
Train takes in a directory path to persist the model to, trains the model, and saves the model to the given file. After this is run you can run the SentimentXXX functions effectively.
Note that this must be run from within the project directory! To just get the model without re-training you should just call "Resore"
func (Models) SentimentAnalysis ¶
SentimentAnalysis takes in a (possibly 'dirty') sentence (or any block of text,) cleans the text, finds the sentiment of each word in the text, finds the sentiment of the sentence as a whole, adn returns an Analysis struct
type Score ¶
Score holds the score of a singular word (differs from SentenceScore only in param names and JSON marshaling, not actualy types)
type SentenceScore ¶
SentenceScore holds the score of a document, which could be (and probably is) a sentence