ollama

package
v0.1.11-pre.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2024 License: MIT Imports: 8 Imported by: 79

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrEmptyResponse       = errors.New("no response")
	ErrIncompleteEmbedding = errors.New("not all input got embedded")
)

Functions

This section is empty.

Types

type LLM

type LLM struct {
	CallbacksHandler callbacks.Handler
	// contains filtered or unexported fields
}

LLM is a ollama LLM implementation.

func New

func New(opts ...Option) (*LLM, error)

New creates a new ollama LLM implementation.

func (*LLM) Call

func (o *LLM) Call(ctx context.Context, prompt string, options ...llms.CallOption) (string, error)

Call Implement the call interface for LLM.

func (*LLM) CreateEmbedding

func (o *LLM) CreateEmbedding(ctx context.Context, inputTexts []string) ([][]float32, error)

func (*LLM) GenerateContent added in v0.1.4

func (o *LLM) GenerateContent(ctx context.Context, messages []llms.MessageContent, options ...llms.CallOption) (*llms.ContentResponse, error)

GenerateContent implements the Model interface. nolint: goerr113

type Option

type Option func(*options)

func WithCustomTemplate

func WithCustomTemplate(template string) Option

WithCustomTemplate To override the templating done on Ollama model side.

func WithFormat added in v0.1.6

func WithFormat(format string) Option

WithFormat Sets the Ollama output format (currently Ollama only supports "json").

func WithHTTPClient added in v0.1.4

func WithHTTPClient(client *http.Client) Option

WithHTTPClient Set custom http client.

func WithKeepAlive added in v0.1.9

func WithKeepAlive(keepAlive string) Option

WithKeepAlive controls how long the model will stay loaded into memory following the request (default: 5m) only supported by ollama v0.1.23 and later

If set to a positive duration (e.g. 20m, 1h or 30), the model will stay loaded for the provided duration
If set to a negative duration (e.g. -1), the model will stay loaded indefinitely
If set to 0, the model will be unloaded immediately once finished
If not set, the model will stay loaded for 5 minutes by default

func WithModel

func WithModel(model string) Option

WithModel Set the model to use.

func WithPredictMirostat

func WithPredictMirostat(val int) Option

WithPredictMirostat Enable Mirostat sampling for controlling perplexity (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).

func WithPredictMirostatEta

func WithPredictMirostatEta(val float32) Option

WithPredictMirostatEta Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive (Default: 0.1).

func WithPredictMirostatTau

func WithPredictMirostatTau(val float32) Option

WithPredictMirostatTau Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text (Default: 5.0).

func WithPredictPenalizeNewline

func WithPredictPenalizeNewline(val bool) Option

WithPredictPenalizeNewline Penalize newline tokens when applying the repeat penalty (default: true).

func WithPredictRepeatLastN

func WithPredictRepeatLastN(val int) Option

WithPredictRepeatLastN Sets how far back for the model to look back to prevent repetition (Default: 64, 0 = disabled, -1 = num_ctx).

func WithPredictTFSZ

func WithPredictTFSZ(val float32) Option

WithPredictTFSZ Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting (default: 1).

func WithPredictTypicalP

func WithPredictTypicalP(val float32) Option

WithPredictTypicalP Enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).

func WithRunnerEmbeddingOnly

func WithRunnerEmbeddingOnly(val bool) Option

WithRunnerEmbeddingOnly Only return the embbeding.

func WithRunnerF16KV

func WithRunnerF16KV(val bool) Option

WithRunnerF16KV If set to falsem, use 32-bit floats instead of 16-bit floats for memory key+value.

func WithRunnerLogitsAll

func WithRunnerLogitsAll(val bool) Option

WithRunnerLogitsAll Return logits for all tokens, not just the last token.

func WithRunnerLowVRAM

func WithRunnerLowVRAM(val bool) Option

WithRunnerLowVRAM Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed.

func WithRunnerMainGPU

func WithRunnerMainGPU(num int) Option

WithRunnerMainGPU When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used.

func WithRunnerNumBatch

func WithRunnerNumBatch(num int) Option

WithRunnerNumBatch Set the batch size for prompt processing (default: 512).

func WithRunnerNumCtx

func WithRunnerNumCtx(num int) Option

WithRunnerNumCtx Sets the size of the context window used to generate the next token (Default: 2048).

func WithRunnerNumGPU

func WithRunnerNumGPU(num int) Option

WithRunnerNumGPU The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.

func WithRunnerNumGQA

func WithRunnerNumGQA(num int) Option

WithRunnerNumGQA The number of GQA groups in the transformer layer. Required for some models.

func WithRunnerNumKeep

func WithRunnerNumKeep(num int) Option

WithRunnerNumKeep Specify the number of tokens from the initial prompt to retain when the model resets its internal context.

func WithRunnerNumThread

func WithRunnerNumThread(num int) Option

WithRunnerNumThread Set the number of threads to use during computation (default: auto).

func WithRunnerRopeFrequencyBase

func WithRunnerRopeFrequencyBase(val float32) Option

WithRunnerRopeFrequencyBase RoPE base frequency (default: loaded from model).

func WithRunnerRopeFrequencyScale

func WithRunnerRopeFrequencyScale(val float32) Option

WithRunnerRopeFrequencyScale Rope frequency scaling factor (default: loaded from model).

func WithRunnerUseMLock

func WithRunnerUseMLock(val bool) Option

WithRunnerUseMLock Force system to keep model in RAM.

func WithRunnerUseMMap

func WithRunnerUseMMap(val bool) Option

WithRunnerUseMMap Set to false to not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.

func WithRunnerUseNUMA

func WithRunnerUseNUMA(numa bool) Option

WithBackendUseNUMA Use NUMA optimization on certain systems.

func WithRunnerVocabOnly

func WithRunnerVocabOnly(val bool) Option

WithRunnerVocabOnly Only load the vocabulary, no weights.

func WithServerURL

func WithServerURL(rawURL string) Option

WithServerURL Set the URL of the ollama instance to use.

func WithSystemPrompt

func WithSystemPrompt(p string) Option

WithSystem Set the system prompt. This is only valid if WithCustomTemplate is not set and the ollama model use .System in its model template OR if WithCustomTemplate is set using {{.System}}.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL