Documentation ¶
Overview ¶
Package gowe provides types and functions to represent a word embedding model and to consume models encoded in multiple formats. gowe can consume models in the following formats:
- plaintext (plain) e.g. "the 0.418 0.24968 ..." The first line may be the vocabulary size and dim e.g. "300000 128", in this case, we skip it
Index ¶
- func NNearestIn[T VectorScalar, M Model[T]](m M, s string, vocab []string, n uint) ([]string, error)
- func QuantizationShift[I IntScalar](maxMagnitude float64) uint8
- func RankSimilarity[T VectorScalar, M Model[T]](m M, s string, vocab []string) []string
- type FloatModel
- func (m *FloatModel[F]) Dimensions() uint
- func (m *FloatModel[F]) FromBinaryFile(p string, bitSize int, _ ...interface{}) error
- func (m *FloatModel[F]) FromPlainFile(p string, desc bool, _ ...interface{}) error
- func (m *FloatModel[F]) Similarity(s, t string) float64
- func (m *FloatModel[F]) Vector(s string) []F
- func (m *FloatModel[F]) VocabularySize() uint
- type FloatScalar
- type FloatVector
- func (v FloatVector[F]) Add(u FloatVector[F]) FloatVector[F]
- func (v FloatVector[F]) CosineSimilarity(u FloatVector[F]) float64
- func (v FloatVector[F]) Dot(u FloatVector[F]) float64
- func (v FloatVector[F]) Magnitude() float64
- func (v FloatVector[F]) Normalize() FloatVector[F]
- func (v FloatVector[F]) Subtract(u FloatVector[F]) FloatVector[F]
- type IntModel
- func (m *IntModel[I]) Dimensions() uint
- func (m *IntModel[I]) FromBinaryFile(p string, bitSize int, opts ...interface{}) error
- func (m *IntModel[I]) FromPlainFile(p string, desc bool, opts ...interface{}) error
- func (m *IntModel[I]) Similarity(s, t string) float64
- func (m *IntModel[I]) Vector(s string) []I
- func (m *IntModel[I]) VocabularySize() uint
- type IntScalar
- type IntVector
- func (v IntVector[I]) Add(u IntVector[I]) IntVector[I]
- func (v IntVector[int32]) CosineSimilarity(u IntVector[int32]) float64
- func (v IntVector[I]) Dot(u IntVector[I]) float64
- func (v IntVector[I]) Magnitude() float64
- func (v IntVector[I]) Normalize() IntVector[I]
- func (v IntVector[I]) Subtract(u IntVector[I]) IntVector[I]
- type Model
- type VectorScalar
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NNearestIn ¶ added in v0.2.0
func QuantizationShift ¶ added in v0.2.0
QuantizationShift() determines the max integer bitshift with respect to an expected maximum magnitude (must be positive) and then a bit more room for vector operations
For example, if we want to convert a float64 vector [-2.89, 0.2] to an int8 vector, we might want to use a maximum magnitude of 3.00 so we call
v := FloatVector[float64]{scalars: []float64{-2.89, 0.2},} qV := QuantizeFloatVector[int8](v, 3.0)
So we take the ceiling of Log2(3.0) which is ceil(1.58) = 2, so we need at 2 bits to represent the whole number component of the magnitude. We then reserve 2 extra bits for math and 1 bit for the sign, and our resulting quantized ints will look like this:
| sign[1] | extra[2] | whole[2] | fractional[3]
Our resulting shift will be 3 with a decimal precision of 2^-3 = 0.125 If we convert the int8 value back to float64, we should get:
[-2.875, 0.25]
We can then use this shift in QuantizeFloatVector to quantize a whole group of vectors
func RankSimilarity ¶ added in v0.2.0
func RankSimilarity[T VectorScalar, M Model[T]](m M, s string, vocab []string) []string
Types ¶
type FloatModel ¶ added in v0.2.0
type FloatModel[F FloatScalar] struct { // contains filtered or unexported fields }
* FloatModel *
func NewFloatModel ¶ added in v0.2.0
func NewFloatModel[F FloatScalar]() *FloatModel[F]
func (*FloatModel[F]) Dimensions ¶ added in v0.2.0
func (m *FloatModel[F]) Dimensions() uint
func (*FloatModel[F]) FromBinaryFile ¶ added in v0.3.0
func (m *FloatModel[F]) FromBinaryFile( p string, bitSize int, _ ...interface{}) error
func (*FloatModel[F]) FromPlainFile ¶ added in v0.2.0
func (m *FloatModel[F]) FromPlainFile( p string, desc bool, _ ...interface{}) error
func (*FloatModel[F]) Similarity ¶ added in v0.2.0
func (m *FloatModel[F]) Similarity(s, t string) float64
func (*FloatModel[F]) Vector ¶ added in v0.2.0
func (m *FloatModel[F]) Vector(s string) []F
func (*FloatModel[F]) VocabularySize ¶ added in v0.2.0
func (m *FloatModel[F]) VocabularySize() uint
type FloatScalar ¶ added in v0.2.0
type FloatVector ¶
type FloatVector[F FloatScalar] struct { // contains filtered or unexported fields }
func DequantizeIntVector ¶
func DequantizeIntVector[F FloatScalar, I IntScalar]( v IntVector[I]) FloatVector[F]
func (FloatVector[F]) Add ¶
func (v FloatVector[F]) Add(u FloatVector[F]) FloatVector[F]
func (FloatVector[F]) CosineSimilarity ¶
func (v FloatVector[F]) CosineSimilarity(u FloatVector[F]) float64
Fused-loop implementation of CosineSimilarity
func (FloatVector[F]) Dot ¶
func (v FloatVector[F]) Dot(u FloatVector[F]) float64
func (FloatVector[F]) Magnitude ¶
func (v FloatVector[F]) Magnitude() float64
func (FloatVector[F]) Normalize ¶
func (v FloatVector[F]) Normalize() FloatVector[F]
func (FloatVector[F]) Subtract ¶
func (v FloatVector[F]) Subtract(u FloatVector[F]) FloatVector[F]
type IntModel ¶ added in v0.2.0
type IntModel[I IntScalar] struct { // contains filtered or unexported fields }
* IntModel *
func NewIntModel ¶ added in v0.2.0
func (*IntModel[I]) Dimensions ¶ added in v0.2.0
func (*IntModel[I]) FromBinaryFile ¶ added in v0.3.0
func (*IntModel[I]) FromPlainFile ¶ added in v0.2.0
func (*IntModel[I]) Similarity ¶ added in v0.2.0
func (*IntModel[I]) VocabularySize ¶ added in v0.2.0
type IntVector ¶
IntVector is used for quantized representations of FloatVectors, the shift value represents how many bits shifted the integer is from the underlying float's real magnitude value, in other words, the number of bits that can the decimal portion of the scalars.
func QuantizeFloatVector ¶
func QuantizeFloatVector[I IntScalar, F FloatScalar]( v FloatVector[F], shift uint8) IntVector[I]
func (IntVector[I]) Add ¶
Never operate on IntVectors of different shifts, this operation is designed to be fast so it doesn't check it.
func (IntVector[int32]) CosineSimilarity ¶
Fused-loop implementation of CosineSimilarity
type Model ¶
type Model[T VectorScalar] interface { // Loads model from plaintext file FromPlainFile(p string, desc bool, opts ...interface{}) error // Loads model from binary file // Binary files must have a description and scalars can be either float32 // or float64, and the user passes that in via bitSize. If bitSize is not // 64, it defaults to 32, which is the standard. FromBinaryFile(p string, bitSize int, opts ...interface{}) error // Returns vector as array of scalars for a word. Note that for IntModels, // this will return the shifted quantized ints. Vector(s string) []T // Returns dimensions Dimensions() uint // Returns size of vocabulary VocabularySize() uint // Returns the cosine similarity between two strings Similarity(s, t string) float64 }
type VectorScalar ¶ added in v0.2.0
type VectorScalar interface { FloatScalar | IntScalar }