embeddings

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 12, 2024 License: MIT Imports: 4 Imported by: 7

Documentation

Overview

Package embeddings contains helpers for creating vector embeddings from text using different providers.

The main components of this package are:

  • Embedder interface: a common interface for creating vector embeddings from texts, with optional batching.
  • NewEmbedder creates implementations of Embedder from provider LLM (or Chat) clients.

See the package example below.

Example
package main

import (
	"context"
	"log"

	"github.com/czc09/langchaingo/embeddings"
	"github.com/czc09/langchaingo/llms/openai"
)

func main() { //nolint:testableexamples
	llm, err := openai.New()
	if err != nil {
		log.Fatal(err)
	}

	// Create a new Embedder from the given LLM.
	embedder, err := embeddings.NewEmbedder(llm)
	if err != nil {
		log.Fatal(err)
	}

	docs := []string{"doc 1", "another doc"}
	embs, err := embedder.EmbedDocuments(context.Background(), docs)
	if err != nil {
		log.Fatal(err)
	}

	// Consume embs
	_ = embs
}
Output:

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ErrVectorsNotSameSize is returned if the vectors returned from the
	// embeddings api have different sizes.
	ErrVectorsNotSameSize = errors.New("vectors gotten not the same size")
	// ErrAllTextsLenZero is returned if all texts to be embedded has the combined
	// length of zero.
	ErrAllTextsLenZero = errors.New("all texts have length 0")
)

Functions

func BatchTexts

func BatchTexts(texts []string, batchSize int) [][]string

BatchTexts splits strings by the length batchSize.

func BatchedEmbed

func BatchedEmbed(ctx context.Context, embedder EmbedderClient, texts []string, batchSize int) ([][]float32, error)

BatchedEmbed creates embeddings for the given input texts, batching them into batches of batchSize if needed.

func CombineVectors

func CombineVectors(vectors [][]float32, weights []int) ([]float32, error)

func MaybeRemoveNewLines

func MaybeRemoveNewLines(texts []string, removeNewLines bool) []string

Types

type Embedder

type Embedder interface {
	// EmbedDocuments returns a vector for each text.
	EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)
	// EmbedQuery embeds a single text.
	EmbedQuery(ctx context.Context, text string) ([]float32, error)
}

Embedder is the interface for creating vector embeddings from texts.

type EmbedderClient

type EmbedderClient interface {
	CreateEmbedding(ctx context.Context, texts []string) ([][]float32, error)
}

EmbedderClient is the interface LLM clients implement for embeddings.

type EmbedderImpl

type EmbedderImpl struct {
	StripNewLines bool
	BatchSize     int
	// contains filtered or unexported fields
}

func NewEmbedder

func NewEmbedder(client EmbedderClient, opts ...Option) (*EmbedderImpl, error)

NewEmbedder creates a new Embedder from the given EmbedderClient, with some options that affect how embedding will be done.

func (*EmbedderImpl) EmbedDocuments

func (ei *EmbedderImpl) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

EmbedDocuments creates one vector embedding for each of the texts.

func (*EmbedderImpl) EmbedQuery

func (ei *EmbedderImpl) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single text.

type Option

type Option func(p *EmbedderImpl)

func WithBatchSize

func WithBatchSize(batchSize int) Option

WithBatchSize is an option for specifying the batch size.

func WithStripNewLines

func WithStripNewLines(stripNewLines bool) Option

WithStripNewLines is an option for specifying the should it strip new lines.

Directories

Path Synopsis
m3e
Huggingface Text Embeddings Inference https://github.com/huggingface/text-embeddings-inference
Huggingface Text Embeddings Inference https://github.com/huggingface/text-embeddings-inference

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL