Documentation ¶
Index ¶
- Variables
- type LLM
- func (o *LLM) Call(ctx context.Context, prompt string, options ...llms.CallOption) (string, error)
- func (o *LLM) CreateEmbedding(ctx context.Context, inputTexts []string) ([][]float32, error)
- func (o *LLM) GenerateContent(ctx context.Context, messages []llms.MessageContent, ...) (*llms.ContentResponse, error)
- type Option
- func WithCustomTemplate(template string) Option
- func WithFormat(format string) Option
- func WithHTTPClient(client *http.Client) Option
- func WithKeepAlive(keepAlive string) Option
- func WithModel(model string) Option
- func WithPredictMirostat(val int) Option
- func WithPredictMirostatEta(val float32) Option
- func WithPredictMirostatTau(val float32) Option
- func WithPredictPenalizeNewline(val bool) Option
- func WithPredictRepeatLastN(val int) Option
- func WithPredictTFSZ(val float32) Option
- func WithPredictTypicalP(val float32) Option
- func WithRunnerEmbeddingOnly(val bool) Option
- func WithRunnerF16KV(val bool) Option
- func WithRunnerLogitsAll(val bool) Option
- func WithRunnerLowVRAM(val bool) Option
- func WithRunnerMainGPU(num int) Option
- func WithRunnerNumBatch(num int) Option
- func WithRunnerNumCtx(num int) Option
- func WithRunnerNumGPU(num int) Option
- func WithRunnerNumGQA(num int) Option
- func WithRunnerNumKeep(num int) Option
- func WithRunnerNumThread(num int) Option
- func WithRunnerRopeFrequencyBase(val float32) Option
- func WithRunnerRopeFrequencyScale(val float32) Option
- func WithRunnerUseMLock(val bool) Option
- func WithRunnerUseMMap(val bool) Option
- func WithRunnerUseNUMA(numa bool) Option
- func WithRunnerVocabOnly(val bool) Option
- func WithServerURL(rawURL string) Option
- func WithSystemPrompt(p string) Option
Constants ¶
This section is empty.
Variables ¶
var ( ErrEmptyResponse = errors.New("no response") ErrIncompleteEmbedding = errors.New("not all input got embedded") )
Functions ¶
This section is empty.
Types ¶
type LLM ¶
LLM is a ollama LLM implementation.
func (*LLM) CreateEmbedding ¶
func (*LLM) GenerateContent ¶
func (o *LLM) GenerateContent(ctx context.Context, messages []llms.MessageContent, options ...llms.CallOption) (*llms.ContentResponse, error)
GenerateContent implements the Model interface. nolint: goerr113
type Option ¶
type Option func(*options)
func WithCustomTemplate ¶
WithCustomTemplate To override the templating done on Ollama model side.
func WithFormat ¶
WithFormat Sets the Ollama output format (currently Ollama only supports "json").
func WithHTTPClient ¶
WithHTTPClient Set custom http client.
func WithKeepAlive ¶
WithKeepAlive controls how long the model will stay loaded into memory following the request (default: 5m) only supported by ollama v0.1.23 and later
If set to a positive duration (e.g. 20m, 1h or 30), the model will stay loaded for the provided duration If set to a negative duration (e.g. -1), the model will stay loaded indefinitely If set to 0, the model will be unloaded immediately once finished If not set, the model will stay loaded for 5 minutes by default
func WithPredictMirostat ¶
WithPredictMirostat Enable Mirostat sampling for controlling perplexity (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).
func WithPredictMirostatEta ¶
WithPredictMirostatEta Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive (Default: 0.1).
func WithPredictMirostatTau ¶
WithPredictMirostatTau Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text (Default: 5.0).
func WithPredictPenalizeNewline ¶
WithPredictPenalizeNewline Penalize newline tokens when applying the repeat penalty (default: true).
func WithPredictRepeatLastN ¶
WithPredictRepeatLastN Sets how far back for the model to look back to prevent repetition (Default: 64, 0 = disabled, -1 = num_ctx).
func WithPredictTFSZ ¶
WithPredictTFSZ Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting (default: 1).
func WithPredictTypicalP ¶
WithPredictTypicalP Enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).
func WithRunnerEmbeddingOnly ¶
WithRunnerEmbeddingOnly Only return the embbeding.
func WithRunnerF16KV ¶
WithRunnerF16KV If set to falsem, use 32-bit floats instead of 16-bit floats for memory key+value.
func WithRunnerLogitsAll ¶
WithRunnerLogitsAll Return logits for all tokens, not just the last token.
func WithRunnerLowVRAM ¶
WithRunnerLowVRAM Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed.
func WithRunnerMainGPU ¶
WithRunnerMainGPU When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used.
func WithRunnerNumBatch ¶
WithRunnerNumBatch Set the batch size for prompt processing (default: 512).
func WithRunnerNumCtx ¶
WithRunnerNumCtx Sets the size of the context window used to generate the next token (Default: 2048).
func WithRunnerNumGPU ¶
WithRunnerNumGPU The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.
func WithRunnerNumGQA ¶
WithRunnerNumGQA The number of GQA groups in the transformer layer. Required for some models.
func WithRunnerNumKeep ¶
WithRunnerNumKeep Specify the number of tokens from the initial prompt to retain when the model resets its internal context.
func WithRunnerNumThread ¶
WithRunnerNumThread Set the number of threads to use during computation (default: auto).
func WithRunnerRopeFrequencyBase ¶
WithRunnerRopeFrequencyBase RoPE base frequency (default: loaded from model).
func WithRunnerRopeFrequencyScale ¶
WithRunnerRopeFrequencyScale Rope frequency scaling factor (default: loaded from model).
func WithRunnerUseMLock ¶
WithRunnerUseMLock Force system to keep model in RAM.
func WithRunnerUseMMap ¶
WithRunnerUseMMap Set to false to not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.
func WithRunnerUseNUMA ¶
WithBackendUseNUMA Use NUMA optimization on certain systems.
func WithRunnerVocabOnly ¶
WithRunnerVocabOnly Only load the vocabulary, no weights.
func WithServerURL ¶
WithServerURL Set the URL of the ollama instance to use.
func WithSystemPrompt ¶
WithSystem Set the system prompt. This is only valid if WithCustomTemplate is not set and the ollama model use .System in its model template OR if WithCustomTemplate is set using {{.System}}.