gowe

package module

v0.3.0 Latest Latest Go to latest Published: May 18, 2024 License: 0BSD Imports: 12 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jackiedeng0/gowe

Links

Open Source Insights

README ¶

gowe - Word Embedding Utilities for Go

gowe is a Go package for using word embeddings.

Motivation

There are existing packages for word embeddings in Go that have inspired this package, but most have not seen updates for a while:

This motivates the creation of a new package built for modern Go (1.22+).

API

Import using:

import "github.com/jackiedeng0/gowe"

Load plaintext file to a float 32 model:

model := newFloatModel[float32]()
err := model.FromPlainFile("glove.6B.50d.txt", false)
// You can retrieve this model at https://github.com/stanfordnlp/GloVe/
// 'false' because the file doesn't have a "<size> <dim>" description

// Get the vector embedding for a word
fmt.Println(model.Vector("cat"))
// [1.45281 -0.50108 -0.53714 -0.015697 0.22191 ... ]

// Get the similarity (cosine) between two words
fmt.Printf("%0.3f\n", model.Similarity("cat", "dog"))
// 0.922

// Within a list of words, exhaustively search and rank the N most similar words
words := []string{"dog", "apple", "lincoln", "whisker", "road", "cheetah"}
nearest, err := model.NNearestIn("cat", words, 3)
fmt.Println(nearest)
// [dog cheetah apple]

Load plaintext file to a quantized int model (int8, int16, int32 supported):

model := newIntModel[int16]()
err := model.FromPlainFile("glove.6B.50d.txt", false, 5.0)
// Requires an additional float64 argument for the maximum magnitude of any
// scalar value - in this case, it was 5.0. For a normalized model, this would
// be 1.0

Load binary file to float and int models respectively:

floatModel := newFloatModel[float32]()
err := model.FromBinaryFile("model.bin", 32)
// Description is always provided, so we just need to specify the bitSize of
// floating points in the file

intModel := newIntModel[int8]()
err := model.FromBinaryFile("model.bin", 32, 2.0)

Status

Load plaintext model files as float64 embedding models
Float and Int generic vector types
Quantization and Dequantization
Loading models as any vector type
Loading binary model files

Documentation ¶

Overview ¶

Package gowe provides types and functions to represent a word embedding model and to consume models encoded in multiple formats. gowe can consume models in the following formats:

plaintext (plain) e.g. "the 0.418 0.24968 ..." The first line may be the vocabulary size and dim e.g. "300000 128", in this case, we skip it

Index ¶

func NNearestIn[T VectorScalar, M Model[T]](m M, s string, vocab []string, n uint) ([]string, error)
func QuantizationShift[I IntScalar](maxMagnitude float64) uint8
func RankSimilarity[T VectorScalar, M Model[T]](m M, s string, vocab []string) []string
type FloatModel
- func NewFloatModel[F FloatScalar]() *FloatModel[F]
type FloatScalar
type FloatVector
- func DequantizeIntVector[F FloatScalar, I IntScalar](v IntVector[I]) FloatVector[F]
type IntModel
- func NewIntModel[I IntScalar]() *IntModel[I]
type IntScalar
type IntVector
- func QuantizeFloatVector[I IntScalar, F FloatScalar](v FloatVector[F], shift uint8) IntVector[I]
type Model
type VectorScalar

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NNearestIn ¶ added in v0.2.0

func NNearestIn[T VectorScalar, M Model[T]](m M, s string, vocab []string, n uint) ([]string, error)

func QuantizationShift ¶ added in v0.2.0

func QuantizationShift[I IntScalar](maxMagnitude float64) uint8

QuantizationShift() determines the max integer bitshift with respect to an expected maximum magnitude (must be positive) and then a bit more room for vector operations

For example, if we want to convert a float64 vector [-2.89, 0.2] to an int8 vector, we might want to use a maximum magnitude of 3.00 so we call

v := FloatVector[float64]{scalars: []float64{-2.89, 0.2},}
qV := QuantizeFloatVector[int8](v, 3.0)

So we take the ceiling of Log2(3.0) which is ceil(1.58) = 2, so we need at 2 bits to represent the whole number component of the magnitude. We then reserve 2 extra bits for math and 1 bit for the sign, and our resulting quantized ints will look like this:

| sign[1] | extra[2] | whole[2] | fractional[3]

Our resulting shift will be 3 with a decimal precision of 2^-3 = 0.125 If we convert the int8 value back to float64, we should get:

[-2.875, 0.25]

We can then use this shift in QuantizeFloatVector to quantize a whole group of vectors

func RankSimilarity ¶ added in v0.2.0

func RankSimilarity[T VectorScalar, M Model[T]](m M, s string, vocab []string) []string

Types ¶

type FloatModel ¶ added in v0.2.0

type FloatModel[F FloatScalar] struct {
	// contains filtered or unexported fields
}

* FloatModel *

func NewFloatModel ¶ added in v0.2.0

func NewFloatModel[F FloatScalar]() *FloatModel[F]

func (*FloatModel[F]) Dimensions ¶ added in v0.2.0

func (m *FloatModel[F]) Dimensions() uint

func (*FloatModel[F]) FromBinaryFile ¶ added in v0.3.0

func (m *FloatModel[F]) FromBinaryFile(
	p string, bitSize int, _ ...interface{}) error

func (*FloatModel[F]) FromPlainFile ¶ added in v0.2.0

func (m *FloatModel[F]) FromPlainFile(
	p string, desc bool, _ ...interface{}) error

func (*FloatModel[F]) Similarity ¶ added in v0.2.0

func (m *FloatModel[F]) Similarity(s, t string) float64

func (*FloatModel[F]) Vector ¶ added in v0.2.0

func (m *FloatModel[F]) Vector(s string) []F

func (*FloatModel[F]) VocabularySize ¶ added in v0.2.0

func (m *FloatModel[F]) VocabularySize() uint

type FloatScalar ¶ added in v0.2.0

type FloatScalar interface {
	float32 | float64
}

type FloatVector ¶

type FloatVector[F FloatScalar] struct {
	// contains filtered or unexported fields
}

func DequantizeIntVector ¶

func DequantizeIntVector[F FloatScalar, I IntScalar](
	v IntVector[I]) FloatVector[F]

func (FloatVector[F]) Add ¶

func (v FloatVector[F]) Add(u FloatVector[F]) FloatVector[F]

func (FloatVector[F]) CosineSimilarity ¶

func (v FloatVector[F]) CosineSimilarity(u FloatVector[F]) float64

Fused-loop implementation of CosineSimilarity

func (FloatVector[F]) Dot ¶

func (v FloatVector[F]) Dot(u FloatVector[F]) float64

func (FloatVector[F]) Magnitude ¶

func (v FloatVector[F]) Magnitude() float64

func (FloatVector[F]) Normalize ¶

func (v FloatVector[F]) Normalize() FloatVector[F]

func (FloatVector[F]) Subtract ¶

func (v FloatVector[F]) Subtract(u FloatVector[F]) FloatVector[F]

type IntModel ¶ added in v0.2.0

type IntModel[I IntScalar] struct {
	// contains filtered or unexported fields
}

* IntModel *

func NewIntModel ¶ added in v0.2.0

func NewIntModel[I IntScalar]() *IntModel[I]

func (*IntModel[I]) Dimensions ¶ added in v0.2.0

func (m *IntModel[I]) Dimensions() uint

func (*IntModel[I]) FromBinaryFile ¶ added in v0.3.0

func (m *IntModel[I]) FromBinaryFile(
	p string, bitSize int, opts ...interface{}) error

func (*IntModel[I]) FromPlainFile ¶ added in v0.2.0

func (m *IntModel[I]) FromPlainFile(
	p string, desc bool, opts ...interface{}) error

func (*IntModel[I]) Similarity ¶ added in v0.2.0

func (m *IntModel[I]) Similarity(s, t string) float64

func (*IntModel[I]) Vector ¶ added in v0.2.0

func (m *IntModel[I]) Vector(s string) []I

func (*IntModel[I]) VocabularySize ¶ added in v0.2.0

func (m *IntModel[I]) VocabularySize() uint

type IntScalar ¶ added in v0.2.0

type IntScalar interface {
	int8 | int16 | int32
}

type IntVector ¶

type IntVector[I int8 | int16 | int32] struct {
	// contains filtered or unexported fields
}

IntVector is used for quantized representations of FloatVectors, the shift value represents how many bits shifted the integer is from the underlying float's real magnitude value, in other words, the number of bits that can the decimal portion of the scalars.

func QuantizeFloatVector ¶

func QuantizeFloatVector[I IntScalar, F FloatScalar](
	v FloatVector[F], shift uint8) IntVector[I]

func (IntVector[I]) Add ¶

func (v IntVector[I]) Add(u IntVector[I]) IntVector[I]

Never operate on IntVectors of different shifts, this operation is designed to be fast so it doesn't check it.

func (IntVector[int32]) CosineSimilarity ¶

func (v IntVector[int32]) CosineSimilarity(u IntVector[int32]) float64

Fused-loop implementation of CosineSimilarity

func (IntVector[I]) Dot ¶

func (v IntVector[I]) Dot(u IntVector[I]) float64

func (IntVector[I]) Magnitude ¶

func (v IntVector[I]) Magnitude() float64

func (IntVector[I]) Normalize ¶

func (v IntVector[I]) Normalize() IntVector[I]

func (IntVector[I]) Subtract ¶

func (v IntVector[I]) Subtract(u IntVector[I]) IntVector[I]

type Model ¶

type Model[T VectorScalar] interface {
	// Loads model from plaintext file
	FromPlainFile(p string, desc bool, opts ...interface{}) error
	// Loads model from binary file
	// Binary files must have a description and scalars can be either float32
	// or float64, and the user passes that in via bitSize. If bitSize is not
	// 64, it defaults to 32, which is the standard.
	FromBinaryFile(p string, bitSize int, opts ...interface{}) error
	// Returns vector as array of scalars for a word. Note that for IntModels,
	// this will return the shifted quantized ints.
	Vector(s string) []T
	// Returns dimensions
	Dimensions() uint
	// Returns size of vocabulary
	VocabularySize() uint
	// Returns the cosine similarity between two strings
	Similarity(s, t string) float64
}

type VectorScalar ¶ added in v0.2.0

type VectorScalar interface {
	FloatScalar | IntScalar
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL