search

package module

v0.3.0 Latest Latest Go to latest Published: Oct 28, 2024 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

Semantic Search

This library was created to provide an easy and efficient solution for embedding and vector search, making it perfect for small to medium-scale projects that still need some serious semantic power. It’s built around a simple idea: if your dataset is small enough, you can achieve accurate results with brute-force techniques, and with some smart optimizations like SIMD, you can keep things fast and lean.

The library’s strength lies in its simplicity and support for GGUF BERT models, letting you leverage sophisticated embeddings without getting bogged down by the complexities of traditional search systems. It offers GPU acceleration, enabling quick computations on supported hardware. If your dataset has fewer than 100,000 entries, this library is a great fit for integrating semantic search into your Go applications with minimal hassle.

🚀 Key Features

llama.cpp without cgo: The library is built to work with llama.cpp without using cgo. Instead, it relies on purego , which allows calling shared C libraries directly from Go code without the need for cgo. This design significantly simplifies the integration, deployment, and cross-compilation, making it easier to build Go applications that interface with native libraries.
Support for BERT Models: The library supports BERT models via llama.cpp. Vast variations of BERT models can be used, as long as they are using GGUF format.
Precompiled Binaries with Vulkan GPU Support: Available for Windows and Linux in the dist directory, compiled with Vulkan for GPU acceleration. However, you can compile the library yourself with or without GPU support.
Search Index for Embeddings: The library supports the creation of a search index from computed embeddings, which can be saved to disk and loaded later. This feature is suitable for basic vector-based searches in small-scale applications, but it may face efficiency challenges with large datasets due to the use of brute-force techniques.

🤔 Limitations

While simple vector search excels in small-scale applications,avoid using this library if you have the following requirements.

Large Datasets: The current implementation is designed for small-scale applications, and datasets exceeding 100,000 entries may suffer from performance bottlenecks due to the brute-force search approach. For larger datasets, approximate nearest neighbor (ANN) algorithms and specialized data structures should be considered for efficiency.
Complex Query Requirements: The library focuses on simple vector similarity search and does not support advanced query capabilities like multi-field filtering, fuzzy matching, or SQL-like operations that are common in more sophisticated search engines.
High-Dimensional Complex Embeddings: Large language models (LLMs) generate embeddings that are both high-dimensional and computationally intensive. Handling these embeddings in real-time can be taxing on the system unless sufficient GPU resources are available and optimized for low-latency inference.

📚 How to Use the Library

This example demonstrates how to use the library to generate embeddings for text and perform a simple vector search. The code snippet below shows how to load a model, generate embeddings for text, create a search index, and perform a search.

Install library: Precompiled binaries for Windows and Linux are provided in the dist directory. If your target architecture or platform isn't covered by these binaries, you'll need to compile the library from the source. Drop these binaries in /usr/lib or equivalent.
Load a model: The search.NewVectorizer function initializes a model using a GGUF file. This example loads the MiniLM-L6-v2.Q8_0.gguf model. The second parameter, indicates the number of GPU layers to enable (0 for CPU only).

m, err := search.NewVectorizer("../dist/MiniLM-L6-v2.Q8_0.gguf", 0)
if err != nil {
    // handle error
}
defer m.Close()

Generate text embeddings: The EmbedText method is used to generate vector embeddings for a given text input. This converts your text into a dense numerical vector representation given the model you loaded in the previous step.

embedding, err := m.EmbedText("Your text here")

Create an index and adding vectors: Create a new index using search.NewIndex. The type parameter [string] in this example specifies that each vector is associated with a string value. You can add multiple vectors with corresponding labels.

index := search.NewIndex[string]()
index.Add(embedding, "Your text here")

Search the index: Perform a search using the Search method, which takes an embedding vector and a number of results to retrieve. This example searches for the 10 most relevant results and prints them along with their relevance scores.

results := index.Search(embedding, 10)
for _, r := range results {
    fmt.Printf("Result: %s (Relevance: %.2f)\n", r.Value, r.Relevance)
}

🛠 Compile library

First, clone the repository and its submodules with the following commands. The --recurse-submodules flag is used to clone the ggml submodule, which is a header-only library for matrix operations.

git submodule update --init --recursive
git lfs pull

Compile on Linux

Make sure you have a C/C++ compiler and CMake installed. For Ubuntu, you can install them with the following commands:

sudo apt-get update
sudo apt-get install build-essential cmake

Then you can compile the library with the following commands:

mkdir build && cd build
cmake -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc ..
cmake --build . --config Release

This should generate libllama_go.so that statically links everything necessary. You can also install the library by coping it into /usr/lib.

Compile on Windows

Make sure you have a C/C++ compiler and CMake installed. For Windows, a simple option is to use Build Tools for Visual Studio (make sure CLI tools are included) and CMake.

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release

If you are using Visual Studio, solution files are generated. You can open the solution file with Visual Studio and build the project from there. The bin directory would then contain llamago.dll.

GPU and other options

To enable GPU support (e.g. Vulkan), you'll need to add an appropriate flag to the CMake command, please refer to refer to the llama.cpp build documentation for more details. For example, to compile with Vulkan support on Windows make sure Vulkan SDK is installed and then run the following commands:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON ..
cmake --build . --config Release

Documentation ¶

Rendered for

Index ¶

type Context
type Index
- func NewIndex[T any]() *Index[T]
type Result
type Vector
type Vectorizer
- func NewVectorizer(modelPath string, gpuLayers int) (*Vectorizer, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Context ¶

type Context struct {
	// contains filtered or unexported fields
}

Context represents a context for embedding text using the model.

func (*Context) Close ¶

func (ctx *Context) Close() error

Close closes the context and releases any resources associated with it.

func (*Context) EmbedText ¶

func (ctx *Context) EmbedText(text string) ([]float32, error)

EmbedText embeds the given text using the model.

func (*Context) Tokens ¶

func (ctx *Context) Tokens() uint

Tokens returns the number of tokens processed by the context.

type Index ¶

type Index[T any] struct {
	// contains filtered or unexported fields
}

Index represents a brute-force search index, returning exact results.

func NewIndex ¶

func NewIndex[T any]() *Index[T]

NewIndex creates a new exact search index.

func (*Index[T]) Add ¶

func (idx *Index[T]) Add(vx Vector, item T)

Add adds a new vector to the search index.

func (*Index[T]) Len ¶ added in v0.3.0

func (idx *Index[T]) Len() int

Len returns the number of items in the index.

func (*Index[T]) ReadFile ¶ added in v0.3.0

func (idx *Index[T]) ReadFile(filename string) error

ReadFile reads the index from a flate-compressed binary file.

func (*Index[T]) ReadFrom ¶ added in v0.3.0

func (b *Index[T]) ReadFrom(src io.Reader) (int64, error)

ReadFrom reads the index from a reader.

func (*Index[T]) Search ¶

func (idx *Index[T]) Search(query Vector, k int) []Result[T]

Search searches the index for the k-nearest neighbors of the query vector.

func (*Index[T]) WriteFile ¶ added in v0.3.0

func (idx *Index[T]) WriteFile(filename string) error

WriteFile writes the index into a flate-compressed binary file.

func (*Index[T]) WriteTo ¶ added in v0.3.0

func (b *Index[T]) WriteTo(dst io.Writer) (int64, error)

WriteTo writes the index to a writer.

type Result ¶

type Result[T any] struct {
	Relevance float64 // The relevance of the result
	Value     T       // The value of the result
}

Result represents a search result.

type Vector ¶

type Vector = []float32

type Vectorizer ¶

type Vectorizer struct {
	// contains filtered or unexported fields
}

Vectorizer represents a loaded LLM/Embedding model.

func NewVectorizer ¶

func NewVectorizer(modelPath string, gpuLayers int) (*Vectorizer, error)

NewVectorizer creates a new vectorizer model from the given model file.

func (*Vectorizer) Close ¶

func (m *Vectorizer) Close() error

Close closes the model and releases any resources associated with it.

func (*Vectorizer) Context ¶

func (m *Vectorizer) Context(size int) *Context

Context creates a new context of the given size.

func (*Vectorizer) EmbedText ¶

func (m *Vectorizer) EmbedText(text string) ([]float32, error)

EmbedText embeds the given text using the model.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
example
internal
cosine/simd
eval

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL