Documentation ¶
Overview ¶
Package ollama provides an Ollama API client.
Index ¶
- Constants
- type ConversationContext
- type Embedding
- type EmbeddingResponse
- type Model
- type ModelLoadStats
- type Ollama
- func (llm *Ollama) Embed(ctx context.Context, model *Model, inputs []string) (*EmbeddingResponse, error)
- func (llm *Ollama) Prompt(ctx context.Context, prompt *Prompt) (*Response, error)
- func (llm *Ollama) PromptUntil(ctx context.Context, prompt *Prompt, ...) (*Response, error)
- func (llm *Ollama) SetCheapModels(cheapModels []*Model)
- func (llm *Ollama) WaitUntilServing(ctx context.Context) error
- func (llm *Ollama) WarmModel(ctx context.Context, model *Model, keepWarmFor time.Duration, unloadFirst bool) (*ModelLoadStats, error)
- type Prompt
- type Response
- func (r *Response) Done() bool
- func (r *Response) EvalDuration() time.Duration
- func (r *Response) LoadDuration() time.Duration
- func (r *Response) NumTokens() int
- func (r *Response) OutputTokensPerSecond() float64
- func (r *Response) PromptEvalDuration() time.Duration
- func (r *Response) String() string
- func (r *Response) Text() string
- func (r *Response) TimePerOutputTokenAverage() time.Duration
- func (r *Response) TimePerOutputTokenQuantile(quantile float64) time.Duration
- func (r *Response) TimeToFirstToken() time.Duration
- func (r *Response) TimeToLastToken() time.Duration
- func (r *Response) TokenGenerationStdDev() time.Duration
- func (r *Response) TotalDuration() time.Duration
- type ResponseMetrics
- type Server
Constants ¶
const (
// Port is the port used by the ollama server.
Port = 11434
)
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ConversationContext ¶
type ConversationContext []int
ConversationContext represents a conversational context. It is returned by a response and may be passed to a follow-up prompt.
type EmbeddingResponse ¶
type EmbeddingResponse struct { // Model is the model used to generate the embeddings. Model *Model // Embeddings is the list of embeddings generated for the given inputs. Embeddings []Embedding // TotalDuration is the total duration of the embedding request as // measured by the server, not the client. TotalDuration time.Duration // LoadDuration is the duration of the embedding model load time as measured // by the server, not the client. LoadDuration time.Duration // PromptEvalCount is the number of prompt evaluations performed by the // server. PromptEvalCount int // ResponseMetrics contains HTTP response metrics as perceived by the // client. ResponseMetrics ResponseMetrics }
EmbeddingResponse represents the result of running an embedding model on a set of inputs.
type Model ¶
type Model struct { // Name is the name of the ollama model, e.g. "codellama:7b". Name string // Options maps parameter names to JSON-compatible values. Options map[string]any }
Model encodes a model and options for it.
func ZeroTemperatureModel ¶
ZeroTemperatureModel returns a Model with the given name and an initial temperature setting of zero. This setting allows for consistent settings.
func (*Model) RaiseTemperature ¶
func (m *Model) RaiseTemperature()
RaiseTemperature increases the "temperature" option of the model, if any.
type ModelLoadStats ¶
type ModelLoadStats struct { // ClientReportedDuration is the duration to load the model as perceived // by the client, measured by HTTP client metrics. ClientReportedDuration time.Duration }
ModelLoadStats holds metrics about the model loading process.
type Ollama ¶
type Ollama struct { // ModelNames is the list of available model names. ModelNames []string // HasGPU is set depending on whether the LLM has GPU access. // ollama supports running both on CPU and GPU, and detects this // by spawning nvidia-smi. HasGPU bool // contains filtered or unexported fields }
Ollama is an ollama client.
func New ¶
New starts a new Ollama server in the given container, then waits for it to serve and returns the client.
func NewDocker ¶
func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)
NewDocker returns a new Ollama client talking to an Ollama server that runs in a local Docker container.
func (*Ollama) Embed ¶
func (llm *Ollama) Embed(ctx context.Context, model *Model, inputs []string) (*EmbeddingResponse, error)
Embed generates embeddings for each of the given inputs.
func (*Ollama) PromptUntil ¶
func (llm *Ollama) PromptUntil(ctx context.Context, prompt *Prompt, iterate func(*Prompt, *Response) (*Prompt, error)) (*Response, error)
PromptUntil repeatedly issues a prompt until `iterate` returns a nil error. `iterate` may optionally return an updated `Prompt` which will be used to follow up. This is useful to work around the flakiness of LLMs in tests.
func (*Ollama) SetCheapModels ¶
SetCheapModels can be used to inform this Ollama client as to the list of models it can use that are known to be cheap. This is useful when forcefully unloading models by swapping them with another one, to ensure that the one it is being swapped with is small. Therefore, there should be at least two models specified here.
func (*Ollama) WaitUntilServing ¶
WaitUntilServing waits until ollama is serving, or the context expires.
func (*Ollama) WarmModel ¶
func (llm *Ollama) WarmModel(ctx context.Context, model *Model, keepWarmFor time.Duration, unloadFirst bool) (*ModelLoadStats, error)
WarmModel pre-warms a model in memory and keeps it warm for `keepWarmFor`. If `unloadFirst` is true, another model will be loaded before loading the requested model. This ensures that the model was loaded from a cold state.
type Prompt ¶
type Prompt struct { // Model is the model to query. Model *Model // If set, keep the model alive in memory for the given duration after this // prompt is answered. A zero duration will use the ollama default (a few // minutes). Note that model unloading is asynchronous, so the model will // not be fully unloaded after only `KeepModelAlive` beyond prompt response. KeepModelAlive time.Duration // Query is the prompt string. // Common leading whitespace will be removed. Query string // Context is the conversational context to follow up on, if any. // This is returned from `Response`. Context ConversationContext // contains filtered or unexported fields }
Prompt is an ollama prompt.
func (*Prompt) AddImage ¶
AddImage attaches an image to the prompt. Returns itself for chainability.
func (*Prompt) CleanQuery ¶
CleanQuery removes common whitespace from query lines, and all leading/ending whitespace-only lines. It is useful to be able to specify query string as indented strings without breaking visual continuity in Go code. For example (where dots are spaces):
"""\n ..The Quick Brown Fox\n ..Jumps Over\n ....The Lazy Dog\n ."""
becomes:
""The Quick Brown Fox\n Jumps Over\n ..The Lazy Dog"""
func (*Prompt) WithHotterModel ¶
WithHotterModel returns a copy of this prompt with the same model having a higher temperature.
type Response ¶
type Response struct {
// contains filtered or unexported fields
}
Response represents a response to a query from Ollama.
func (*Response) EvalDuration ¶
EvalDuration returns the response evaluation time.
func (*Response) LoadDuration ¶
LoadDuration returns the load response generation time as reported by the ollama server.
func (*Response) OutputTokensPerSecond ¶
OutputTokensPerSecond computes the average number of output tokens generated per second.
func (*Response) PromptEvalDuration ¶
PromptEvalDuration returns the prompt evaluation time.
func (*Response) TimePerOutputTokenAverage ¶
TimePerOutputTokenAverage computes the average time to generate an output token.
func (*Response) TimePerOutputTokenQuantile ¶
TimePerOutputTokenQuantile computes a quantile of the time it takes to generate an output token.
func (*Response) TimeToFirstToken ¶
TimeToFirstToken returns the time it took between the request starting and the first token being received by the client.
func (*Response) TimeToLastToken ¶
TimeToLastToken returns the time it took between the request starting and the last token being received by the client.
func (*Response) TokenGenerationStdDev ¶
TokenGenerationStdDev returns the standard deviation of the time between token generations.
func (*Response) TotalDuration ¶
TotalDuration returns the total response generation time.
type ResponseMetrics ¶
type ResponseMetrics struct { // ProgramStarted is the time when the program started. ProgramStarted time.Time `json:"program_started"` // RequestSent is the time when the HTTP request was sent. RequestSent time.Time `json:"request_sent"` // ResponseReceived is the time when the HTTP response headers were received. ResponseReceived time.Time `json:"response_received"` // FirstByteRead is the time when the first HTTP response body byte was read. FirstByteRead time.Time `json:"first_byte_read"` // LastByteRead is the time when the last HTTP response body byte was read. LastByteRead time.Time `json:"last_byte_read"` }
ResponseMetrics are HTTP request metrics from an ollama API query. These is the same JSON struct as defined in `images/gpu/ollama/client/client.go`.
func (*ResponseMetrics) TimeToFirstByte ¶
func (rm *ResponseMetrics) TimeToFirstByte() time.Duration
TimeToFirstByte returns the duration it took between the request being sent and the first byte of the response being read.
func (*ResponseMetrics) TimeToLastByte ¶
func (rm *ResponseMetrics) TimeToLastByte() time.Duration
TimeToLastByte returns the duration it took between the request being sent and the last byte of the response being read.
type Server ¶
type Server interface { // InstrumentedRequest performs an instrumented HTTP request against the // ollama server, using the `gpu/ollama_client` ollama image. // `argvFn` takes in a `protocol://host:port` string and returns a // command-line to use for making an instrumented HTTP request against the // ollama server. // InstrumentedRequest should return the logs from the request container. InstrumentedRequest(ctx context.Context, argvFn func(hostPort string) []string) ([]byte, error) // Logs retrieves logs from the server. Logs(ctx context.Context) (string, error) }
Server performs requests against an ollama server.