word2vec

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2022 License: MIT Imports: 16 Imported by: 0

README

word2vec

Build Status GoDoc

word2vec is a Go package which provides functions for querying word2vec models (see https://code.google.com/p/word2vec). Any binary word2vec model file can be loaded and queried.

Requirements

Installation

If you haven't setup Go before, you need to first set a GOPATH (see https://golang.org/doc/code.html#GOPATH).

To fetch and build the code:

$ go get code.sajari.com/word2vec/...

This will build the command line tools (in particular word-calc, word-server, word-client) into $GOPATH/bin (assumed to be in your PATH already).

Usage

word-calc

The word-calc tool is a quick way to perform basic word calculations on a word2vec model. For instance: vec(king) - vec(man) + vec(woman) would be equivalent to:

$ word-calc -model /path/to/model.bin -add king,woman -sub man

See word-calc -h for full more details. Note that word-calc first loads the model every time, and so can appear to be quite slow. Use word-server and word-client to get better performance when running multiple queries on the same model.

word-server and word-client

The word-server tool (see cmd/word-server) creates an HTTP server which wraps a word2vec model which can be queried from Go using a Client, or using the word-client tool (see cmd/word-client).

$ word-server -model /path/to/model.bin -listen localhost:1234

A simple code example using Client:

c := word2vec.Client{Addr: "localhost:1234"}

// Create an expression.
expr := word2vec.Expr{}
expr.Add(1, "king")
expr.Add(-1, "man")
expr.Add(1, "woman")

// Find the most similar result by cosine similarity.
matches, err := c.CosN(expr, 1)
if err != nil {
	log.Fatalf("error evaluating cosine similarity: %v", err)
}
API Example

Alternatively you can interact with a word2vec model directly in your code:

// Load the model from an io.Reader (i.e. a file).
model, err := word2vec.FromReader(r)
if err != nil {
	log.Fatalf("error loading model: %v", err)
}

// Create an expression.
expr := word2vec.Expr{}
expr.Add(1, "king")
expr.Add(-1, "man")
expr.Add(1, "woman")

// Find the most similar result by cosine similarity.
matches, err := model.CosN(expr, 1)
if err != nil {
	log.Fatalf("error evaluating cosine similarity: %v", err)
}

Documentation

Overview

Package word2vec provides functionality for reading binary word2vec models and performing cosine similarity queries (see https://code.google.com/p/word2vec/).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Add

func Add(e Expr, weight float32, words []string)

Add is a convenience method for adding multiple words to an Expr.

func AddWeight

func AddWeight(e Expr, weights []float32, words []string)

AddWeight is a convenience method for adding multiple weighted words to an Expr.

func MultiCosN

func MultiCosN(m *Model, exprs []Expr, n int) ([][]Match, error)

MultiCosN takes a list of expressions and computes the n most similar words for each.

func NewServer

func NewServer(c Coser) http.Handler

NewServer creates a new word2vec server which exports endpoints for performing similarity queries on a word2vec Model.

Types

type Client

type Client struct {
	Addr string
}

Client is type which implements Coser and evaluates Expr similarity queries using a word2vec Server (see above).

func (Client) Cos

func (c Client) Cos(x, y Expr) (float32, error)

Cos implements Coser.

func (Client) CosN

func (c Client) CosN(e Expr, n int) ([]Match, error)

CosN implements Coser.

func (Client) Coses

func (c Client) Coses(pairs [][2]Expr) ([]float32, error)

Coses implements Coser.

type Coser

type Coser interface {
	// Cos computes the cosine similarity of the expressions.
	Cos(e, f Expr) (float32, error)

	// Coses computes the cosine similarity of pairs of expressions.
	Coses(pairs [][2]Expr) ([]float32, error)

	// CosN computes the N most similar words to the expression.
	CosN(e Expr, n int) ([]Match, error)
}

Coser is an interface which defines methods which can evaluate cosine similarity between Exprs.

func NewCache

func NewCache(c Coser) Coser

NewCache returns a Coser which will cache repeated calls to the Cos method, particularly useful when using Client.

type Expr

type Expr map[string]float32

Expr is a type which represents a linear expresssion of (weight, word) pairs which can be evaluated to a vector by a word2vec Model.

func (Expr) Add

func (e Expr) Add(weight float32, word string)

Add appends the given word with specified weight to the expression. If the word already exists in the expression, then the weights are added.

func (Expr) Eval

func (e Expr) Eval(m *Model) (Vector, error)

Eval evaluates the Expr to a Vector using a Model.

type Mapper

type Mapper interface {
	Map(words []string) map[string]Vector
}

Mapper is an interface which defines a method which can return a mapping of word -> Vector for each word in words.

type Match

type Match struct {
	Word  string  `json:"word"`
	Score float32 `json:"score"`
}

Match is a type which represents a pairing of a word and score indicating the similarity of this word against a search word.

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model is a type which represents a word2vec Model and implements the Coser and Mapper interfaces.

func FromReader

func FromReader(r io.Reader, normalize bool) (*Model, error)

FromReader creates a Model using the binary model data provided by the io.Reader.

func (*Model) Cos

func (m *Model) Cos(a, b Expr) (float32, error)

Cos returns the cosine similarity of the given expressions.

func (*Model) CosN

func (m *Model) CosN(e Expr, n int) ([]Match, error)

CosN computes the n most similar words to the expression. Returns an error if the expression could not be evaluated.

func (*Model) Coses

func (m *Model) Coses(pairs [][2]Expr) ([]float32, error)

Coses returns the cosine similarity of each pair of expressions in the list. Returns immediately if an error occurs.

func (*Model) Dim

func (m *Model) Dim() int

Dim returns the dimention of the vectors in the model.

func (*Model) Eval

func (m *Model) Eval(expr Expr) (Vector, error)

Eval constructs a vector by evaluating the expression vector. Returns an error if a word is not in the model.

func (*Model) Map

func (m *Model) Map(words []string) map[string]Vector

Map returns a mapping word -> Vector for each word in `words`. Unknown words are ignored.

func (*Model) Size

func (m *Model) Size() int

Size returns the number of words in the model.

type NotFoundError

type NotFoundError struct {
	Word string
}

NotFoundError is an error returned from Model functions when an input word is not in the model.

func (NotFoundError) Error

func (e NotFoundError) Error() string

type Vector

type Vector []float32

Vector is a type which represents a word vector.

func (Vector) Add

func (v Vector) Add(a float32, u Vector)

Add performs v += a * u (in-place).

func (Vector) Dot

func (v Vector) Dot(u Vector) float32

Dot computes the dot product with u.

func (Vector) Norm

func (v Vector) Norm() float32

Norm computes the Euclidean norm of the vector.

func (Vector) Normalise

func (v Vector) Normalise()

Normalise normalises the vector in-place.

Directories

Path Synopsis
cmd
partition
partition is a tool which reads word2vec classes output and allows you to query the data.
partition is a tool which reads word2vec classes output and allows you to query the data.
word-calc
wordcalc is a tool which reads word2vec binary models and allows you to do basic calculations with lists of query words.
wordcalc is a tool which reads word2vec binary models and allows you to do basic calculations with lists of query words.
word-client
word2vec-client is a tool which queries a `word-server` HTTP server to do computations with a word2vec model.
word2vec-client is a tool which queries a `word-server` HTTP server to do computations with a word2vec model.
word-server
word-server creates an HTTP server which exports endpoints for querying a word2vec model.
word-server creates an HTTP server which exports endpoints for querying a word2vec model.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL