txt

command module
v0.0.0-...-390c46b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2024 License: BSD-3-Clause Imports: 12 Imported by: 0

README

About

This project implements a language model by using contexts and context mixing to produce an embedding vector. Each context is a histogram containing the symbol counts found in a circular symbol buffer. There are eight contexts with circular buffer sizes: 1, 2, 4, 8, 16, 32, 64, and 128 which are fed with 8 bit symbols. Context mixing is performed with self attention. The eight histogram contexts are compressed down to a single embedding vector and then associated with the next symbol. Nearest neighbor is used for inferring the next symbol for a given embedding.

Mixer

// Mix mixes the histograms
func (m Mixer) Mix() [256]byte {
	mix := [256]byte{}
	x := NewMatrix(256, Size)
	for i := range m.Histograms {
		sum := 0.0
		for _, v := range m.Histograms[i].Vector {
			sum += float64(v)
		}
		for _, v := range m.Histograms[i].Vector {
			x.Data = append(x.Data, float64(v)/sum)
		}
	}
	y := SelfAttention(x, x, x).Sum()
	sum := 0.0
	for _, v := range y.Data {
		sum += v
	}
	for i := range mix {
		mix[i] = byte(128 * y.Data[i] / sum)
	}
	return mix
}

Usage

Clone the repo and then:

go build

To build the vector database (1.1GB):

./txt -build

To query the vector database using nearest neightbor

./txt -brute -query "God"

To query the vector database using approximate nearest neighbor:

./txt -query "God"

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL