model

package
v0.0.0-...-ba2758a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2019 License: Apache-2.0 Imports: 2 Imported by: 0

README

Model

Word2Vec

Word2Vec is composed of the following modules:

Model:

  • Skip-Gram
  • CBOW

Optimizer:

  • Hierarchical Softmax
  • Negative Sampling
Usage
Word2Vec: Continuous Bag-of-Words and Skip-gram model

Usage:
  wego word2vec [flags]

Flags:
      --batchSize int       interval word size to update learning rate (default 10000)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for word2vec
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --maxDepth int        times to track huffman tree, max-depth=0 means to track full path from root to word (for hierarchical softmax only)
      --min-count int       lower limit to filter rare words (default 5)
      --model string        which model does it use? one of: cbow|skip-gram (default "cbow")
      --optimizer string    which optimizer does it use? one of: hs|ns (default "hs")
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --sample int          negative sample size(for negative sampling only) (default 5)
      --theta float         lower limit of learning rate (lr >= initlr * theta) (default 0.0001)
      --thread int          number of goroutine (default 8)
      --threshold float     threshold for subsampling (default 0.001)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)

GloVe

GloVe is weighted matrix factorization model for co-occurrence map between words.

Usage
GloVe: Global Vectors for Word Representation

Usage:
  wego glove [flags]

Flags:
      --alpha float         exponent of weighting function (default 0.75)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for glove
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --min-count int       lower limit to filter rare words (default 5)
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --solver string       solver for GloVe objective. One of: sgd|adagrad (default "sgd")
      --thread int          number of goroutine (default 8)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)
      --xmax int            specifying cutoff in weighting function (default 100)

Lexvec

Usage
Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

Usage:
  wego lexvec [flags]

Flags:
      --batchSize int       interval word size to update learning rate (default 10000)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for lexvec
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --min-count int       lower limit to filter rare words (default 5)
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --rel string          relation type for counting co-occurrence. One of ppmi|pmi|co|logco (default "ppmi")
      --sample int          negative sample size(for negative sampling only) (default 5)
      --save-vec string     save vector type. One of: normal|add (default "normal")
      --smooth float        smoothing value (default 0.75)
      --theta float         lower limit of learning rate (lr >= initlr * theta) (default 0.0001)
      --thread int          number of goroutine (default 12)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IndexPerThread

func IndexPerThread(threadSize, dataSize int) []int

IndexPerThread creates interval of indices per thread.

func NextRandom

func NextRandom(value int) int

NextRandom is linear congruential generator like rand.Intn(window)

Types

type Model

type Model interface {
	Train(f io.Reader) error
	Save(outputFile string) error
	Get() (map[string][]float64, error)
}

Model is the interface that has Train, Save.

type Option

type Option struct {
	Dimension      int
	Iteration      int
	MinCount       int
	ThreadSize     int
	BatchSize      int
	Window         int
	Initlr         float64
	ToLower        bool
	Verbose        bool
	SaveVectorType SaveVectorType
}

Option stores common options for each model.

type SaveVectorType

type SaveVectorType int

SaveVectorType is a list of types to save model.

const (
	// NORMAL saves word vectors only.
	NORMAL SaveVectorType = iota
	// ADD add word to context vectors, and save them.
	ADD
)

func (SaveVectorType) String

func (t SaveVectorType) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL