Documentation ¶
Overview ¶
Package go2vec loads word2vec embeddings.
This package can load binary word2vec files. It also supports distance and analogy queries on the embeddings.
go2vec uses gonum's C BLAS binding by default. Binding to the right BLAS library can give nice performance improvements. The binding can be configured using CGO flags. For instance, to link against OpenBLAS on Linux:
CGO_LDFLAGS="-L/path/to/OpenBLAS -lopenblas" go install github.com/gonum/blas/cgo
or Accelerate on OS X:
CGO_LDFLAGS="-framework Accelerate" go install github.com/gonum/blas/cgo
Index ¶
- func CosineSimilarity(vec1, vec2 []float32) float64
- type Embeddings
- func (e *Embeddings) Analogy(word1, word2, word3 string, limit int) ([]WordSimilarity, error)
- func (e *Embeddings) Embedding(word string) ([]float32, bool)
- func (e *Embeddings) EmbeddingSize() int
- func (e *Embeddings) Iterate(f IterFunc)
- func (e *Embeddings) Matrix() []float32
- func (e *Embeddings) Put(word string, embedding []float32) error
- func (e *Embeddings) SetBLAS(impl blas.Float32Level2)
- func (e *Embeddings) Similarity(word string, limit int) ([]WordSimilarity, error)
- func (e *Embeddings) Size() int
- func (e *Embeddings) WordIdx(word string) (int, bool)
- func (e *Embeddings) Write(w *bufio.Writer) error
- type IterFunc
- type WordSimilarity
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CosineSimilarity ¶ added in v1.1.0
Types ¶
type Embeddings ¶
type Embeddings struct {
// contains filtered or unexported fields
}
Embeddings is used to store a set of word embeddings, such that common operations can be performed on these embeddings (such as retrieving similar words).
func NewEmbeddings ¶
func NewEmbeddings(embedSize int) *Embeddings
NewEmbeddings creates a set of word embeddings from scratch. This constructor should be used in conjunction with 'Put' to populate the embeddings.
func ReadWord2VecBinary ¶
func ReadWord2VecBinary(r *bufio.Reader, normalize bool) (*Embeddings, error)
ReadWord2VecBinary reads word embeddings from a binary file that is produced by word2vec. The embeddings can be normalized using their L2 norms.
func (*Embeddings) Analogy ¶
func (e *Embeddings) Analogy(word1, word2, word3 string, limit int) ([]WordSimilarity, error)
Analogy performs word analogy queries.
Consider an analogy of the form 'word1' is to 'word2' as 'word3' is to 'word4'. This method returns candidates for 'word4' based on 'word1..3'.
If 'e1' is the embedding of 'word1', etc., then the embedding 'e4 = (e2 - e1) + e3' is computed. Then the words with embeddings that are the most similar to e4 are returned.
The query words are never returned as a result.
func (*Embeddings) Embedding ¶
func (e *Embeddings) Embedding(word string) ([]float32, bool)
Embedding returns the embedding for a particular word. If the word is unknown, the second return value will be false.
func (*Embeddings) EmbeddingSize ¶
func (e *Embeddings) EmbeddingSize() int
EmbeddingSize returns the embedding size.
func (*Embeddings) Iterate ¶
func (e *Embeddings) Iterate(f IterFunc)
Iterate applies the provided iteration function to all word embeddings.
func (*Embeddings) Matrix ¶
func (e *Embeddings) Matrix() []float32
func (*Embeddings) Put ¶
func (e *Embeddings) Put(word string, embedding []float32) error
Put adds a word embedding to the word embeddings. The new word can be queried after the call returns.
func (*Embeddings) SetBLAS ¶
func (e *Embeddings) SetBLAS(impl blas.Float32Level2)
SetBLAS sets the BLAS implementation to use (default: C BLAS).
func (*Embeddings) Similarity ¶
func (e *Embeddings) Similarity(word string, limit int) ([]WordSimilarity, error)
Similarity finds words that have embeddings that are similar to that of the given word. The 'limit' argument specifis how many words should be returned. The returned slice is ordered by similarity.
The query word is never returned as a result.
func (*Embeddings) Size ¶
func (e *Embeddings) Size() int
Size returns the number of words in the embeddings.
type IterFunc ¶
IterFunc is a function for iterating over word embeddings. The function should return 'false' if the iteration should be stopped.
type WordSimilarity ¶
WordSimilarity stores the similarity of a word compared to a query word.