Documentation ¶
Overview ¶
Package embeddings contains helpers for creating vector embeddings from text using different providers.
The main components of this package are:
- Embedder interface: a common interface for creating vector embeddings from texts, with optional batching.
- NewEmbedder creates implementations of Embedder from provider LLM (or Chat) clients.
See the package example below.
Example ¶
package main import ( "context" "log" "github.com/tmc/langchaingo/embeddings" "github.com/tmc/langchaingo/llms/openai" ) func main() { //nolint:testableexamples llm, err := openai.New() if err != nil { log.Fatal(err) } // Create a new Embedder from the given LLM. embedder, err := embeddings.NewEmbedder(llm) if err != nil { log.Fatal(err) } docs := []string{"doc 1", "another doc"} embs, err := embedder.EmbedDocuments(context.Background(), docs) if err != nil { log.Fatal(err) } // Consume embs _ = embs }
Output:
Index ¶
- Variables
- func BatchTexts(texts []string, batchSize int) [][]string
- func BatchedEmbed(ctx context.Context, embedder EmbedderClient, texts []string, batchSize int) ([][]float32, error)
- func CombineVectors(vectors [][]float32, weights []int) ([]float32, error)
- func MaybeRemoveNewLines(texts []string, removeNewLines bool) []string
- type Embedder
- type EmbedderClient
- type EmbedderClientFunc
- type EmbedderImpl
- type Option
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrVectorsNotSameSize is returned if the vectors returned from the // embeddings api have different sizes. ErrVectorsNotSameSize = errors.New("vectors gotten not the same size") // ErrAllTextsLenZero is returned if all texts to be embedded has the combined // length of zero. ErrAllTextsLenZero = errors.New("all texts have length 0") )
Functions ¶
func BatchTexts ¶
BatchTexts splits strings by the length batchSize.
func BatchedEmbed ¶
func BatchedEmbed(ctx context.Context, embedder EmbedderClient, texts []string, batchSize int) ([][]float32, error)
BatchedEmbed creates embeddings for the given input texts, batching them into batches of batchSize if needed.
func MaybeRemoveNewLines ¶
Types ¶
type Embedder ¶
type Embedder interface { // EmbedDocuments returns a vector for each text. EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error) // EmbedQuery embeds a single text. EmbedQuery(ctx context.Context, text string) ([]float32, error) }
Embedder is the interface for creating vector embeddings from texts.
type EmbedderClient ¶
type EmbedderClient interface {
CreateEmbedding(ctx context.Context, texts []string) ([][]float32, error)
}
EmbedderClient is the interface LLM clients implement for embeddings.
type EmbedderClientFunc ¶
EmbedderClientFunc is an adapter to allow the use of ordinary functions as Embedder Clients. If `f` is a function with the appropriate signature, `EmbedderClientFunc(f)` is an `EmbedderClient` that calls `f`.
func (EmbedderClientFunc) CreateEmbedding ¶
type EmbedderImpl ¶
type EmbedderImpl struct { StripNewLines bool BatchSize int // contains filtered or unexported fields }
func NewEmbedder ¶
func NewEmbedder(client EmbedderClient, opts ...Option) (*EmbedderImpl, error)
NewEmbedder creates a new Embedder from the given EmbedderClient, with some options that affect how embedding will be done.
func (*EmbedderImpl) EmbedDocuments ¶
EmbedDocuments creates one vector embedding for each of the texts.
func (*EmbedderImpl) EmbedQuery ¶
EmbedQuery embeds a single text.
type Option ¶
type Option func(p *EmbedderImpl)
func WithBatchSize ¶
WithBatchSize is an option for specifying the batch size.
func WithStripNewLines ¶
WithStripNewLines is an option for specifying the should it strip new lines.