ollama

package

v0.1.11-pre.0 Latest Latest Go to latest Published: Jun 16, 2024 License: MIT Imports: 8 Imported by: 79

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tmc/langchaingo

Documentation ¶

Index ¶

Variables
type LLM
- func New(opts ...Option) (*LLM, error)
type Option

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrEmptyResponse       = errors.New("no response")
	ErrIncompleteEmbedding = errors.New("not all input got embedded")
)

Functions ¶

This section is empty.

Types ¶

type LLM ¶

type LLM struct {
	CallbacksHandler callbacks.Handler
	// contains filtered or unexported fields
}

LLM is a ollama LLM implementation.

func New ¶

func New(opts ...Option) (*LLM, error)

New creates a new ollama LLM implementation.

func (*LLM) Call ¶

func (o *LLM) Call(ctx context.Context, prompt string, options ...llms.CallOption) (string, error)

Call Implement the call interface for LLM.

func (*LLM) CreateEmbedding ¶

func (o *LLM) CreateEmbedding(ctx context.Context, inputTexts []string) ([][]float32, error)

func (*LLM) GenerateContent ¶ added in v0.1.4

func (o *LLM) GenerateContent(ctx context.Context, messages []llms.MessageContent, options ...llms.CallOption) (*llms.ContentResponse, error)

GenerateContent implements the Model interface. nolint: goerr113

type Option ¶

type Option func(*options)

func WithCustomTemplate ¶

func WithCustomTemplate(template string) Option

WithCustomTemplate To override the templating done on Ollama model side.

func WithFormat ¶ added in v0.1.6

func WithFormat(format string) Option

WithFormat Sets the Ollama output format (currently Ollama only supports "json").

func WithHTTPClient ¶ added in v0.1.4

func WithHTTPClient(client *http.Client) Option

WithHTTPClient Set custom http client.

func WithKeepAlive ¶ added in v0.1.9

func WithKeepAlive(keepAlive string) Option

WithKeepAlive controls how long the model will stay loaded into memory following the request (default: 5m) only supported by ollama v0.1.23 and later

If set to a positive duration (e.g. 20m, 1h or 30), the model will stay loaded for the provided duration
If set to a negative duration (e.g. -1), the model will stay loaded indefinitely
If set to 0, the model will be unloaded immediately once finished
If not set, the model will stay loaded for 5 minutes by default

func WithPredictMirostat ¶

func WithPredictMirostat(val int) Option

WithPredictMirostat Enable Mirostat sampling for controlling perplexity (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).

func WithPredictMirostatEta ¶

func WithPredictMirostatEta(val float32) Option

WithPredictMirostatEta Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive (Default: 0.1).

func WithPredictMirostatTau ¶

func WithPredictMirostatTau(val float32) Option

WithPredictMirostatTau Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text (Default: 5.0).

func WithPredictPenalizeNewline ¶

func WithPredictPenalizeNewline(val bool) Option

WithPredictPenalizeNewline Penalize newline tokens when applying the repeat penalty (default: true).

func WithPredictRepeatLastN ¶

func WithPredictRepeatLastN(val int) Option

WithPredictRepeatLastN Sets how far back for the model to look back to prevent repetition (Default: 64, 0 = disabled, -1 = num_ctx).

func WithPredictTFSZ ¶

func WithPredictTFSZ(val float32) Option

WithPredictTFSZ Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting (default: 1).

func WithPredictTypicalP ¶

func WithPredictTypicalP(val float32) Option

WithPredictTypicalP Enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).

func WithRunnerEmbeddingOnly ¶

func WithRunnerEmbeddingOnly(val bool) Option

WithRunnerEmbeddingOnly Only return the embbeding.

func WithRunnerF16KV ¶

func WithRunnerF16KV(val bool) Option

WithRunnerF16KV If set to falsem, use 32-bit floats instead of 16-bit floats for memory key+value.

func WithRunnerLogitsAll ¶

func WithRunnerLogitsAll(val bool) Option

WithRunnerLogitsAll Return logits for all tokens, not just the last token.

func WithRunnerLowVRAM ¶

func WithRunnerLowVRAM(val bool) Option

WithRunnerLowVRAM Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed.

func WithRunnerMainGPU ¶

func WithRunnerMainGPU(num int) Option

WithRunnerMainGPU When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used.

func WithRunnerNumBatch ¶

func WithRunnerNumBatch(num int) Option

WithRunnerNumBatch Set the batch size for prompt processing (default: 512).

func WithRunnerNumCtx ¶

func WithRunnerNumCtx(num int) Option

WithRunnerNumCtx Sets the size of the context window used to generate the next token (Default: 2048).

func WithRunnerNumGPU ¶

func WithRunnerNumGPU(num int) Option

WithRunnerNumGPU The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.

func WithRunnerNumGQA ¶

func WithRunnerNumGQA(num int) Option

WithRunnerNumGQA The number of GQA groups in the transformer layer. Required for some models.

func WithRunnerNumKeep ¶

func WithRunnerNumKeep(num int) Option

WithRunnerNumKeep Specify the number of tokens from the initial prompt to retain when the model resets its internal context.

func WithRunnerNumThread ¶

func WithRunnerNumThread(num int) Option

WithRunnerNumThread Set the number of threads to use during computation (default: auto).

func WithRunnerRopeFrequencyBase ¶

func WithRunnerRopeFrequencyBase(val float32) Option

WithRunnerRopeFrequencyBase RoPE base frequency (default: loaded from model).

func WithRunnerRopeFrequencyScale ¶

func WithRunnerRopeFrequencyScale(val float32) Option

WithRunnerRopeFrequencyScale Rope frequency scaling factor (default: loaded from model).

func WithRunnerUseMLock ¶

func WithRunnerUseMLock(val bool) Option

WithRunnerUseMLock Force system to keep model in RAM.

func WithRunnerUseMMap ¶

func WithRunnerUseMMap(val bool) Option

WithRunnerUseMMap Set to false to not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.

func WithRunnerUseNUMA ¶

func WithRunnerUseNUMA(numa bool) Option

WithBackendUseNUMA Use NUMA optimization on certain systems.

func WithRunnerVocabOnly ¶

func WithRunnerVocabOnly(val bool) Option

WithRunnerVocabOnly Only load the vocabulary, no weights.

func WithServerURL ¶

func WithServerURL(rawURL string) Option

WithServerURL Set the URL of the ollama instance to use.

func WithSystemPrompt ¶

func WithSystemPrompt(p string) Option

WithSystem Set the system prompt. This is only valid if WithCustomTemplate is not set and the ollama model use .System in its model template OR if WithCustomTemplate is set using {{.System}}.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
ollamaclient

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL