model

package

v0.0.0-...-ba2758a Latest Latest Go to latest Published: Nov 20, 2019 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ike-dai/wego

Links

Open Source Insights

README ¶

Model

Word2Vec

Word2Vec is composed of the following modules:

Model:

Skip-Gram
CBOW

Optimizer:

Hierarchical Softmax
Negative Sampling

Usage

Word2Vec: Continuous Bag-of-Words and Skip-gram model

Usage:
  wego word2vec [flags]

Flags:
      --batchSize int       interval word size to update learning rate (default 10000)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for word2vec
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --maxDepth int        times to track huffman tree, max-depth=0 means to track full path from root to word (for hierarchical softmax only)
      --min-count int       lower limit to filter rare words (default 5)
      --model string        which model does it use? one of: cbow|skip-gram (default "cbow")
      --optimizer string    which optimizer does it use? one of: hs|ns (default "hs")
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --sample int          negative sample size(for negative sampling only) (default 5)
      --theta float         lower limit of learning rate (lr >= initlr * theta) (default 0.0001)
      --thread int          number of goroutine (default 8)
      --threshold float     threshold for subsampling (default 0.001)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)

GloVe

GloVe is weighted matrix factorization model for co-occurrence map between words.

Usage

GloVe: Global Vectors for Word Representation

Usage:
  wego glove [flags]

Flags:
      --alpha float         exponent of weighting function (default 0.75)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for glove
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --min-count int       lower limit to filter rare words (default 5)
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --solver string       solver for GloVe objective. One of: sgd|adagrad (default "sgd")
      --thread int          number of goroutine (default 8)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)
      --xmax int            specifying cutoff in weighting function (default 100)

Lexvec

Usage

Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

Usage:
  wego lexvec [flags]

Flags:
      --batchSize int       interval word size to update learning rate (default 10000)
  -d, --dimension int       dimension of word vector (default 10)
  -h, --help                help for lexvec
      --initlr float        initial learning rate (default 0.025)
  -i, --inputFile string    input file path for corpus (default "example/input.txt")
      --iter int            number of iteration (default 15)
      --lower               whether the words on corpus convert to lowercase or not
      --min-count int       lower limit to filter rare words (default 5)
  -o, --outputFile string   output file path to save word vectors (default "example/word_vectors.txt")
      --prof                profiling mode to check the performances
      --rel string          relation type for counting co-occurrence. One of ppmi|pmi|co|logco (default "ppmi")
      --sample int          negative sample size(for negative sampling only) (default 5)
      --save-vec string     save vector type. One of: normal|add (default "normal")
      --smooth float        smoothing value (default 0.75)
      --theta float         lower limit of learning rate (lr >= initlr * theta) (default 0.0001)
      --thread int          number of goroutine (default 12)
      --verbose             verbose mode
  -w, --window int          context window size (default 5)

Documentation ¶

Index ¶

func IndexPerThread(threadSize, dataSize int) []int
func NextRandom(value int) int
type Model
type Option
type SaveVectorType
- func (t SaveVectorType) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IndexPerThread ¶

func IndexPerThread(threadSize, dataSize int) []int

IndexPerThread creates interval of indices per thread.

func NextRandom ¶

func NextRandom(value int) int

NextRandom is linear congruential generator like rand.Intn(window)

Types ¶

type Model ¶

type Model interface {
	Train(f io.Reader) error
	Save(outputFile string) error
	Get() (map[string][]float64, error)
}

Model is the interface that has Train, Save.

type Option ¶

type Option struct {
	Dimension      int
	Iteration      int
	MinCount       int
	ThreadSize     int
	BatchSize      int
	Window         int
	Initlr         float64
	ToLower        bool
	Verbose        bool
	SaveVectorType SaveVectorType
}

Option stores common options for each model.

type SaveVectorType ¶

type SaveVectorType int

SaveVectorType is a list of types to save model.

const (
	// NORMAL saves word vectors only.
	NORMAL SaveVectorType = iota
	// ADD add word to context vectors, and save them.
	ADD
)

func (SaveVectorType) String ¶

func (t SaveVectorType) String() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
glove
lexvec
word2vec

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL