ollama

package

v1.0.1 Latest Latest Go to latest Published: Apr 30, 2024 License: Apache-2.0, MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/weew2/gvisor

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ollama provides an Ollama API client.

Index ¶

Constants
type ConversationContext
type Model
- func ZeroTemperatureModel(name string) *Model
type ModelLoadStats
type Ollama
- func New(ctx context.Context, server Server, logger testutil.Logger) (*Ollama, error)
- func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)
type Prompt
type PromptJSON
type Response
type ResponseJSON
type ResponseMetrics
type Server

Constants ¶

View Source

const (
	// Port is the port used by the ollama server.
	Port = 11434
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ConversationContext ¶

type ConversationContext []int

ConversationContext represents a conversational context. It is returned by a response and may be passed to a follow-up prompt.

type Model ¶

type Model struct {
	// Name is the name of the ollama model, e.g. "codellama:7b".
	Name string

	// Options maps parameter names to JSON-compatible values.
	Options map[string]any
}

Model encodes a model and options for it.

func ZeroTemperatureModel ¶

func ZeroTemperatureModel(name string) *Model

ZeroTemperatureModel returns a Model with the given name and an initial temperature setting of zero. This setting allows for consistent settings.

func (*Model) Copy ¶

func (m *Model) Copy() *Model

Copy returns a copy of the model.

func (*Model) RaiseTemperature ¶

func (m *Model) RaiseTemperature()

RaiseTemperature increases the "temperature" option of the model, if any.

func (*Model) String ¶

func (m *Model) String() string

String returns the model's name.

type ModelLoadStats ¶

type ModelLoadStats struct {
	// ClientReportedDuration is the duration to load the model as perceived
	// by the client, measured by HTTP client metrics.
	ClientReportedDuration time.Duration
}

ModelLoadStats holds metrics about the model loading process.

func New ¶

func New(ctx context.Context, server Server, logger testutil.Logger) (*Ollama, error)

New starts a new Ollama server in the given container, then waits for it to serve and returns the client.

func NewDocker ¶

func NewDocker(ctx context.Context, cont *dockerutil.Container, logger testutil.Logger) (*Ollama, error)

NewDocker returns a new Ollama client talking to an Ollama server that runs in a local Docker container.

func (*Ollama) Prompt ¶

func (llm *Ollama) Prompt(ctx context.Context, prompt *Prompt) (*Response, error)

Prompt returns the result of prompting the given `model` with `prompt`.

func (*Ollama) PromptUntil ¶

func (llm *Ollama) PromptUntil(ctx context.Context, prompt *Prompt, iterate func(*Prompt, *Response) (*Prompt, error)) (*Response, error)

PromptUntil repeatedly issues a prompt until `iterate` returns a nil error. `iterate` may optionally return an updated `Prompt` which will be used to follow up. This is useful to work around the flakiness of LLMs in tests.

func (*Ollama) SetCheapModels ¶

func (llm *Ollama) SetCheapModels(cheapModels []*Model)

SetCheapModels can be used to inform this Ollama client as to the list of models it can use that are known to be cheap. This is useful when forcefully unloading models by swapping them with another one, to ensure that the one it is being swapped with is small. Therefore, there should be at least two models specified here.

func (*Ollama) WaitUntilServing ¶

func (llm *Ollama) WaitUntilServing(ctx context.Context) error

WaitUntilServing waits until ollama is serving, or the context expires.

func (*Ollama) WarmModel ¶

func (llm *Ollama) WarmModel(ctx context.Context, model *Model, keepWarmFor time.Duration, unloadFirst bool) (*ModelLoadStats, error)

WarmModel pre-warms a model in memory and keeps it warm for `keepWarmFor`. If `unloadFirst` is true, another model will be loaded before loading the requested model. This ensures that the model was loaded from a cold state.

type Prompt ¶

type Prompt struct {
	// Model is the model to query.
	Model *Model

	// If set, keep the model alive in memory for the given duration after this
	// prompt is answered. A zero duration will use the ollama default (a few
	// minutes). Note that model unloading is asynchronous, so the model will
	// not be fully unloaded after only `KeepModelAlive` beyond prompt response.
	KeepModelAlive time.Duration

	// Query is the prompt string.
	// Common leading whitespace will be removed.
	Query string

	// Context is the conversational context to follow up on, if any.
	// This is returned from `Response`.
	Context ConversationContext
	// contains filtered or unexported fields
}

Prompt is an ollama prompt.

func (*Prompt) AddImage ¶

func (p *Prompt) AddImage(data []byte) *Prompt

AddImage attaches an image to the prompt. Returns itself for chainability.

func (*Prompt) CleanQuery ¶

func (p *Prompt) CleanQuery() string

CleanQuery removes common whitespace from query lines, and all leading/ending whitespace-only lines. It is useful to be able to specify query string as indented strings without breaking visual continuity in Go code. For example (where dots are spaces):

"""\n ..The Quick Brown Fox\n ..Jumps Over\n ....The Lazy Dog\n ."""

becomes:

""The Quick Brown Fox\n Jumps Over\n ..The Lazy Dog"""

func (*Prompt) String ¶

func (p *Prompt) String() string

String returns a human-friendly string representing this prompt.

func (*Prompt) WithHotterModel ¶

func (p *Prompt) WithHotterModel() *Prompt

WithHotterModel returns a copy of this prompt with the same model having a higher temperature.

type PromptJSON ¶

type PromptJSON struct {
	Model     string              `json:"model"`
	Prompt    string              `json:"prompt,omitempty"`
	Images    []string            `json:"images"`
	Stream    bool                `json:"stream"`
	Context   ConversationContext `json:"context"`
	Options   map[string]any      `json:"options"`
	KeepAlive string              `json:"keep_alive,omitempty"`
}

PromptJSON encodes the JSON data for a query.

type Response ¶

type Response struct {
	// contains filtered or unexported fields
}

Response represents a response to a query from Ollama.

func (*Response) Done ¶

func (r *Response) Done() bool

Done returns whether the response was completely generated.

func (*Response) EvalDuration ¶

func (r *Response) EvalDuration() time.Duration

EvalDuration returns the response evaluation time.

func (*Response) LoadDuration ¶

func (r *Response) LoadDuration() time.Duration

LoadDuration returns the load response generation time as reported by the ollama server.

func (*Response) NumTokens ¶

func (r *Response) NumTokens() int

NumTokens returns the number of tokens in the response.

func (*Response) OutputTokensPerSecond ¶

func (r *Response) OutputTokensPerSecond() float64

OutputTokensPerSecond computes the average number of output tokens generated per second.

func (*Response) PromptEvalDuration ¶

func (r *Response) PromptEvalDuration() time.Duration

PromptEvalDuration returns the prompt evaluation time.

func (*Response) String ¶

func (r *Response) String() string

String returns the response text, if it is done.

func (*Response) Text ¶

func (r *Response) Text() string

Text returns the body of the response, if it is done.

func (*Response) TimePerOutputTokenAverage ¶

func (r *Response) TimePerOutputTokenAverage() time.Duration

TimePerOutputTokenAverage computes the average time to generate an output token.

func (*Response) TimePerOutputTokenQuantile ¶

func (r *Response) TimePerOutputTokenQuantile(quantile float64) time.Duration

TimePerOutputTokenQuantile computes a quantile of the time it takes to generate an output token.

func (*Response) TimeToFirstToken ¶

func (r *Response) TimeToFirstToken() time.Duration

TimeToFirstToken returns the time it took between the request starting and the first token being received by the client.

func (*Response) TimeToLastToken ¶

func (r *Response) TimeToLastToken() time.Duration

TimeToLastToken returns the time it took between the request starting and the last token being received by the client.

func (*Response) TokenGenerationStdDev ¶

func (r *Response) TokenGenerationStdDev() time.Duration

TokenGenerationStdDev returns the standard deviation of the time between token generations.

func (*Response) TotalDuration ¶

func (r *Response) TotalDuration() time.Duration

TotalDuration returns the total response generation time.

type ResponseJSON ¶

type ResponseJSON struct {
	Model           string              `json:"model"`
	CreatedAt       time.Time           `json:"created_at"`
	Response        string              `json:"response"`
	Done            bool                `json:"done"`
	TotalNanos      int                 `json:"total_duration"`
	LoadNanos       int                 `json:"load_duration"`
	EvalCount       int                 `json:"eval_count"`
	EvalNanos       int                 `json:"eval_duration"`
	PromptEvalCount int                 `json:"prompt_eval_count"`
	PromptEvalNanos int                 `json:"prompt_eval_duration"`
	Context         ConversationContext `json:"context"`
}

ResponseJSON is the JSON-format response from ollama about a prompt. Note that in `streamed` mode, the `Response` field contains a single token. To recover the whole response, all `Response` fields must be concatenated until the last `ResponseJSON`, identified as such by the `Done` field.

type ResponseMetrics ¶

type ResponseMetrics struct {
	// ProgramStarted is the time when the program started.
	ProgramStarted time.Time `json:"program_started"`
	// RequestSent is the time when the HTTP request was sent.
	RequestSent time.Time `json:"request_sent"`
	// ResponseReceived is the time when the HTTP response headers were received.
	ResponseReceived time.Time `json:"response_received"`
	// FirstByteRead is the time when the first HTTP response body byte was read.
	FirstByteRead time.Time `json:"first_byte_read"`
	// LastByteRead is the time when the last HTTP response body byte was read.
	LastByteRead time.Time `json:"last_byte_read"`
}

ResponseMetrics are HTTP request metrics from an ollama API query. These is the same JSON struct as defined in `images/gpu/ollama/client/client.go`.

type Server ¶

type Server interface {
	// InstrumentedRequest performs an instrumented HTTP request against the
	// ollama server, using the `gpu/ollama_client` ollama image.
	// `argvFn` takes in a `protocol://host:port` string and returns a
	// command-line to use for making an instrumented HTTP request against the
	// ollama server.
	// InstrumentedRequest should return the logs from the request container.
	InstrumentedRequest(ctx context.Context, argvFn func(hostPort string) []string) ([]byte, error)

	// Logs retrieves logs from the server.
	Logs(ctx context.Context) (string, error)
}

Server performs requests against an ollama server.

Source Files ¶

View all Source files

ollama.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL