ollama

package
v0.0.0-...-a5542f1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2024 License: Apache-2.0, MIT Imports: 14 Imported by: 0

Documentation

Overview

Package ollama provides an Ollama API client.

Index

Constants

View Source
const (
	// Port is the port used by the ollama server.
	Port = 11434
)

Variables

This section is empty.

Functions

This section is empty.

Types

type ConversationContext

type ConversationContext []int

ConversationContext represents a conversational context. It is returned by a response and may be passed to a follow-up prompt.

type Model

type Model struct {
	// Name is the name of the ollama model, e.g. "codellama:7b".
	Name string

	// Options maps parameter names to JSON-compatible values.
	Options map[string]any
}

Model encodes a model and options for it.

func ZeroTemperatureModel

func ZeroTemperatureModel(name string) *Model

ZeroTemperatureModel returns a Model with the given name and an initial temperature setting of zero. This setting allows for consistent settings.

func (*Model) Copy

func (m *Model) Copy() *Model

Copy returns a copy of the model.

func (*Model) RaiseTemperature

func (m *Model) RaiseTemperature()

RaiseTemperature increases the "temperature" option of the model, if any.

func (*Model) String

func (m *Model) String() string

String returns the model's name.

type ModelLoadStats

type ModelLoadStats struct {
	// ClientReportedDuration is the duration to load the model as perceived
	// by the client, measured by HTTP client metrics.
	ClientReportedDuration time.Duration
}

ModelLoadStats holds metrics about the model loading process.

type Ollama

type Ollama struct {

	// ModelNames is the list of available model names.
	ModelNames []string

	// HasGPU is set depending on whether the LLM has GPU access.
	// ollama supports running both on CPU and GPU, and detects this
	// by spawning nvidia-smi.
	HasGPU bool
	// contains filtered or unexported fields
}

Ollama is an ollama client.

func New

func New(ctx context.Context, server Server, logger testutil.Logger) (*Ollama, error)

New starts a new Ollama server in the given container, then waits for it to serve and returns the client.

func NewDocker

func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)

NewDocker returns a new Ollama client talking to an Ollama server that runs in a local Docker container.

func (*Ollama) Prompt

func (llm *Ollama) Prompt(ctx context.Context, prompt *Prompt) (*Response, error)

Prompt returns the result of prompting the given `model` with `prompt`.

func (*Ollama) PromptUntil

func (llm *Ollama) PromptUntil(ctx context.Context, prompt *Prompt, iterate func(*Prompt, *Response) (*Prompt, error)) (*Response, error)

PromptUntil repeatedly issues a prompt until `iterate` returns a nil error. `iterate` may optionally return an updated `Prompt` which will be used to follow up. This is useful to work around the flakiness of LLMs in tests.

func (*Ollama) SetCheapModels

func (llm *Ollama) SetCheapModels(cheapModels []*Model)

SetCheapModels can be used to inform this Ollama client as to the list of models it can use that are known to be cheap. This is useful when forcefully unloading models by swapping them with another one, to ensure that the one it is being swapped with is small. Therefore, there should be at least two models specified here.

func (*Ollama) WaitUntilServing

func (llm *Ollama) WaitUntilServing(ctx context.Context) error

WaitUntilServing waits until ollama is serving, or the context expires.

func (*Ollama) WarmModel

func (llm *Ollama) WarmModel(ctx context.Context, model *Model, keepWarmFor time.Duration, unloadFirst bool) (*ModelLoadStats, error)

WarmModel pre-warms a model in memory and keeps it warm for `keepWarmFor`. If `unloadFirst` is true, another model will be loaded before loading the requested model. This ensures that the model was loaded from a cold state.

type Prompt

type Prompt struct {
	// Model is the model to query.
	Model *Model

	// If set, keep the model alive in memory for the given duration after this
	// prompt is answered. A zero duration will use the ollama default (a few
	// minutes). Note that model unloading is asynchronous, so the model will
	// not be fully unloaded after only `KeepModelAlive` beyond prompt response.
	KeepModelAlive time.Duration

	// Query is the prompt string.
	// Common leading whitespace will be removed.
	Query string

	// Context is the conversational context to follow up on, if any.
	// This is returned from `Response`.
	Context ConversationContext
	// contains filtered or unexported fields
}

Prompt is an ollama prompt.

func (*Prompt) AddImage

func (p *Prompt) AddImage(data []byte) *Prompt

AddImage attaches an image to the prompt. Returns itself for chainability.

func (*Prompt) CleanQuery

func (p *Prompt) CleanQuery() string

CleanQuery removes common whitespace from query lines, and all leading/ending whitespace-only lines. It is useful to be able to specify query string as indented strings without breaking visual continuity in Go code. For example (where dots are spaces):

"""\n ..The Quick Brown Fox\n ..Jumps Over\n ....The Lazy Dog\n ."""

becomes:

""The Quick Brown Fox\n Jumps Over\n ..The Lazy Dog"""

func (*Prompt) String

func (p *Prompt) String() string

String returns a human-friendly string representing this prompt.

func (*Prompt) WithHotterModel

func (p *Prompt) WithHotterModel() *Prompt

WithHotterModel returns a copy of this prompt with the same model having a higher temperature.

type PromptJSON

type PromptJSON struct {
	Model     string              `json:"model"`
	Prompt    string              `json:"prompt,omitempty"`
	Images    []string            `json:"images"`
	Stream    bool                `json:"stream"`
	Context   ConversationContext `json:"context"`
	Options   map[string]any      `json:"options"`
	KeepAlive string              `json:"keep_alive,omitempty"`
}

PromptJSON encodes the JSON data for a query.

type Response

type Response struct {
	// contains filtered or unexported fields
}

Response represents a response to a query from Ollama.

func (*Response) Done

func (r *Response) Done() bool

Done returns whether the response was completely generated.

func (*Response) EvalDuration

func (r *Response) EvalDuration() time.Duration

EvalDuration returns the response evaluation time.

func (*Response) LoadDuration

func (r *Response) LoadDuration() time.Duration

LoadDuration returns the load response generation time as reported by the ollama server.

func (*Response) NumTokens

func (r *Response) NumTokens() int

NumTokens returns the number of tokens in the response.

func (*Response) OutputTokensPerSecond

func (r *Response) OutputTokensPerSecond() float64

OutputTokensPerSecond computes the average number of output tokens generated per second.

func (*Response) PromptEvalDuration

func (r *Response) PromptEvalDuration() time.Duration

PromptEvalDuration returns the prompt evaluation time.

func (*Response) String

func (r *Response) String() string

String returns the response text, if it is done.

func (*Response) Text

func (r *Response) Text() string

Text returns the body of the response, if it is done.

func (*Response) TimePerOutputTokenAverage

func (r *Response) TimePerOutputTokenAverage() time.Duration

TimePerOutputTokenAverage computes the average time to generate an output token.

func (*Response) TimePerOutputTokenQuantile

func (r *Response) TimePerOutputTokenQuantile(quantile float64) time.Duration

TimePerOutputTokenQuantile computes a quantile of the time it takes to generate an output token.

func (*Response) TimeToFirstToken

func (r *Response) TimeToFirstToken() time.Duration

TimeToFirstToken returns the time it took between the request starting and the first token being received by the client.

func (*Response) TimeToLastToken

func (r *Response) TimeToLastToken() time.Duration

TimeToLastToken returns the time it took between the request starting and the last token being received by the client.

func (*Response) TokenGenerationStdDev

func (r *Response) TokenGenerationStdDev() time.Duration

TokenGenerationStdDev returns the standard deviation of the time between token generations.

func (*Response) TotalDuration

func (r *Response) TotalDuration() time.Duration

TotalDuration returns the total response generation time.

type ResponseJSON

type ResponseJSON struct {
	Model           string              `json:"model"`
	CreatedAt       time.Time           `json:"created_at"`
	Response        string              `json:"response"`
	Done            bool                `json:"done"`
	TotalNanos      int                 `json:"total_duration"`
	LoadNanos       int                 `json:"load_duration"`
	EvalCount       int                 `json:"eval_count"`
	EvalNanos       int                 `json:"eval_duration"`
	PromptEvalCount int                 `json:"prompt_eval_count"`
	PromptEvalNanos int                 `json:"prompt_eval_duration"`
	Context         ConversationContext `json:"context"`
}

ResponseJSON is the JSON-format response from ollama about a prompt. Note that in `streamed` mode, the `Response` field contains a single token. To recover the whole response, all `Response` fields must be concatenated until the last `ResponseJSON`, identified as such by the `Done` field.

type ResponseMetrics

type ResponseMetrics struct {
	// ProgramStarted is the time when the program started.
	ProgramStarted time.Time `json:"program_started"`
	// RequestSent is the time when the HTTP request was sent.
	RequestSent time.Time `json:"request_sent"`
	// ResponseReceived is the time when the HTTP response headers were received.
	ResponseReceived time.Time `json:"response_received"`
	// FirstByteRead is the time when the first HTTP response body byte was read.
	FirstByteRead time.Time `json:"first_byte_read"`
	// LastByteRead is the time when the last HTTP response body byte was read.
	LastByteRead time.Time `json:"last_byte_read"`
}

ResponseMetrics are HTTP request metrics from an ollama API query. These is the same JSON struct as defined in `images/gpu/ollama/client/client.go`.

type Server

type Server interface {
	// InstrumentedRequest performs an instrumented HTTP request against the
	// ollama server, using the `gpu/ollama_client` ollama image.
	// `argvFn` takes in a `protocol://host:port` string and returns a
	// command-line to use for making an instrumented HTTP request against the
	// ollama server.
	// InstrumentedRequest should return the logs from the request container.
	InstrumentedRequest(ctx context.Context, argvFn func(hostPort string) []string) ([]byte, error)

	// Logs retrieves logs from the server.
	Logs(ctx context.Context) (string, error)
}

Server performs requests against an ollama server.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL