ollama

package

v0.0.0-...-7321e42 Latest Latest Go to latest Published: Dec 18, 2024 License: Apache-2.0, MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/google/gvisor

Documentation ¶

Overview ¶

Package ollama provides an Ollama API client.

Index ¶

Constants
type ConversationContext
type Embedding
type EmbeddingResponse
type Model
- func ZeroTemperatureModel(name string) *Model
type ModelLoadStats
type Ollama
- func New(ctx context.Context, server Server, logger testutil.Logger) (*Ollama, error)
- func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)
type Prompt
type Response
type ResponseMetrics
- func (rm *ResponseMetrics) TimeToFirstByte() time.Duration
- func (rm *ResponseMetrics) TimeToLastByte() time.Duration
type Server

Constants ¶

View Source

const (
	// Port is the port used by the ollama server.
	Port = 11434
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ConversationContext ¶

type ConversationContext []int

ConversationContext represents a conversational context. It is returned by a response and may be passed to a follow-up prompt.

type Embedding ¶

type Embedding struct {
	Input     string
	Embedding []float64
}

Embedding holds the result of running an embedding model on a single input.

type EmbeddingResponse ¶

type EmbeddingResponse struct {
	// Model is the model used to generate the embeddings.
	Model *Model

	// Embeddings is the list of embeddings generated for the given inputs.
	Embeddings []Embedding

	// TotalDuration is the total duration of the embedding request as
	// measured by the server, not the client.
	TotalDuration time.Duration

	// LoadDuration is the duration of the embedding model load time as measured
	// by the server, not the client.
	LoadDuration time.Duration

	// PromptEvalCount is the number of prompt evaluations performed by the
	// server.
	PromptEvalCount int

	// ResponseMetrics contains HTTP response metrics as perceived by the
	// client.
	ResponseMetrics ResponseMetrics
}

EmbeddingResponse represents the result of running an embedding model on a set of inputs.

type Model ¶

type Model struct {
	// Name is the name of the ollama model, e.g. "codellama:7b".
	Name string

	// Options maps parameter names to JSON-compatible values.
	Options map[string]any
}

Model encodes a model and options for it.

func ZeroTemperatureModel ¶

func ZeroTemperatureModel(name string) *Model

ZeroTemperatureModel returns a Model with the given name and an initial temperature setting of zero. This setting allows for consistent settings.

func (*Model) Copy ¶

func (m *Model) Copy() *Model

Copy returns a copy of the model.

func (*Model) RaiseTemperature ¶

func (m *Model) RaiseTemperature()

RaiseTemperature increases the "temperature" option of the model, if any.

func (*Model) String ¶

func (m *Model) String() string

String returns the model's name.

type ModelLoadStats ¶

type ModelLoadStats struct {
	// ClientReportedDuration is the duration to load the model as perceived
	// by the client, measured by HTTP client metrics.
	ClientReportedDuration time.Duration
}

ModelLoadStats holds metrics about the model loading process.

func New ¶

func New(ctx context.Context, server Server, logger testutil.Logger) (*Ollama, error)

New starts a new Ollama server in the given container, then waits for it to serve and returns the client.

func NewDocker ¶

func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)

NewDocker returns a new Ollama client talking to an Ollama server that runs in a local Docker container.

func (*Ollama) Embed ¶

func (llm *Ollama) Embed(ctx context.Context, model *Model, inputs []string) (*EmbeddingResponse, error)

Embed generates embeddings for each of the given inputs.

func (*Ollama) Prompt ¶

func (llm *Ollama) Prompt(ctx context.Context, prompt *Prompt) (*Response, error)

Prompt returns the result of prompting the given `model` with `prompt`.

func (*Ollama) PromptUntil ¶

func (llm *Ollama) PromptUntil(ctx context.Context, prompt *Prompt, iterate func(*Prompt, *Response) (*Prompt, error)) (*Response, error)

PromptUntil repeatedly issues a prompt until `iterate` returns a nil error. `iterate` may optionally return an updated `Prompt` which will be used to follow up. This is useful to work around the flakiness of LLMs in tests.

func (*Ollama) SetCheapModels ¶

func (llm *Ollama) SetCheapModels(cheapModels []*Model)

SetCheapModels can be used to inform this Ollama client as to the list of models it can use that are known to be cheap. This is useful when forcefully unloading models by swapping them with another one, to ensure that the one it is being swapped with is small. Therefore, there should be at least two models specified here.

func (*Ollama) WaitUntilServing ¶

func (llm *Ollama) WaitUntilServing(ctx context.Context) error

WaitUntilServing waits until ollama is serving, or the context expires.

func (*Ollama) WarmModel ¶

func (llm *Ollama) WarmModel(ctx context.Context, model *Model, keepWarmFor time.Duration, unloadFirst bool) (*ModelLoadStats, error)

WarmModel pre-warms a model in memory and keeps it warm for `keepWarmFor`. If `unloadFirst` is true, another model will be loaded before loading the requested model. This ensures that the model was loaded from a cold state.

type Prompt ¶

type Prompt struct {
	// Model is the model to query.
	Model *Model

	// If set, keep the model alive in memory for the given duration after this
	// prompt is answered. A zero duration will use the ollama default (a few
	// minutes). Note that model unloading is asynchronous, so the model will
	// not be fully unloaded after only `KeepModelAlive` beyond prompt response.
	KeepModelAlive time.Duration

	// Query is the prompt string.
	// Common leading whitespace will be removed.
	Query string

	// Context is the conversational context to follow up on, if any.
	// This is returned from `Response`.
	Context ConversationContext
	// contains filtered or unexported fields
}

Prompt is an ollama prompt.

func (*Prompt) AddImage ¶

func (p *Prompt) AddImage(data []byte) *Prompt

AddImage attaches an image to the prompt. Returns itself for chainability.

func (*Prompt) CleanQuery ¶

func (p *Prompt) CleanQuery() string

CleanQuery removes common whitespace from query lines, and all leading/ending whitespace-only lines. It is useful to be able to specify query string as indented strings without breaking visual continuity in Go code. For example (where dots are spaces):

"""\n ..The Quick Brown Fox\n ..Jumps Over\n ....The Lazy Dog\n ."""

becomes:

""The Quick Brown Fox\n Jumps Over\n ..The Lazy Dog"""

func (*Prompt) String ¶

func (p *Prompt) String() string

String returns a human-friendly string representing this prompt.

func (*Prompt) WithHotterModel ¶

func (p *Prompt) WithHotterModel() *Prompt

WithHotterModel returns a copy of this prompt with the same model having a higher temperature.

type Response ¶

type Response struct {
	// contains filtered or unexported fields
}

Response represents a response to a query from Ollama.

func (*Response) Done ¶

func (r *Response) Done() bool

Done returns whether the response was completely generated.

func (*Response) EvalDuration ¶

func (r *Response) EvalDuration() time.Duration

EvalDuration returns the response evaluation time.

func (*Response) LoadDuration ¶

func (r *Response) LoadDuration() time.Duration

LoadDuration returns the load response generation time as reported by the ollama server.

func (*Response) NumTokens ¶

func (r *Response) NumTokens() int

NumTokens returns the number of tokens in the response.

func (*Response) OutputTokensPerSecond ¶

func (r *Response) OutputTokensPerSecond() float64

OutputTokensPerSecond computes the average number of output tokens generated per second.

func (*Response) PromptEvalDuration ¶

func (r *Response) PromptEvalDuration() time.Duration

PromptEvalDuration returns the prompt evaluation time.

func (*Response) String ¶

func (r *Response) String() string

String returns the response text, if it is done.

func (*Response) Text ¶

func (r *Response) Text() string

Text returns the body of the response, if it is done.

func (*Response) TimePerOutputTokenAverage ¶

func (r *Response) TimePerOutputTokenAverage() time.Duration

TimePerOutputTokenAverage computes the average time to generate an output token.

func (*Response) TimePerOutputTokenQuantile ¶

func (r *Response) TimePerOutputTokenQuantile(quantile float64) time.Duration

TimePerOutputTokenQuantile computes a quantile of the time it takes to generate an output token.

func (*Response) TimeToFirstToken ¶

func (r *Response) TimeToFirstToken() time.Duration

TimeToFirstToken returns the time it took between the request starting and the first token being received by the client.

func (*Response) TimeToLastToken ¶

func (r *Response) TimeToLastToken() time.Duration

TimeToLastToken returns the time it took between the request starting and the last token being received by the client.

func (*Response) TokenGenerationStdDev ¶

func (r *Response) TokenGenerationStdDev() time.Duration

TokenGenerationStdDev returns the standard deviation of the time between token generations.

func (*Response) TotalDuration ¶

func (r *Response) TotalDuration() time.Duration

TotalDuration returns the total response generation time.

type ResponseMetrics ¶

type ResponseMetrics struct {
	// ProgramStarted is the time when the program started.
	ProgramStarted time.Time `json:"program_started"`
	// RequestSent is the time when the HTTP request was sent.
	RequestSent time.Time `json:"request_sent"`
	// ResponseReceived is the time when the HTTP response headers were received.
	ResponseReceived time.Time `json:"response_received"`
	// FirstByteRead is the time when the first HTTP response body byte was read.
	FirstByteRead time.Time `json:"first_byte_read"`
	// LastByteRead is the time when the last HTTP response body byte was read.
	LastByteRead time.Time `json:"last_byte_read"`
}

ResponseMetrics are HTTP request metrics from an ollama API query. These is the same JSON struct as defined in `images/gpu/ollama/client/client.go`.

func (*ResponseMetrics) TimeToFirstByte ¶

func (rm *ResponseMetrics) TimeToFirstByte() time.Duration

TimeToFirstByte returns the duration it took between the request being sent and the first byte of the response being read.

func (*ResponseMetrics) TimeToLastByte ¶

func (rm *ResponseMetrics) TimeToLastByte() time.Duration

TimeToLastByte returns the duration it took between the request being sent and the last byte of the response being read.

type Server ¶

type Server interface {
	// InstrumentedRequest performs an instrumented HTTP request against the
	// ollama server, using the `gpu/ollama_client` ollama image.
	// `argvFn` takes in a `protocol://host:port` string and returns a
	// command-line to use for making an instrumented HTTP request against the
	// ollama server.
	// InstrumentedRequest should return the logs from the request container.
	InstrumentedRequest(ctx context.Context, argvFn func(hostPort string) []string) ([]byte, error)

	// Logs retrieves logs from the server.
	Logs(ctx context.Context) (string, error)
}

Server performs requests against an ollama server.

Source Files ¶

View all Source files

ollama.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL