Documentation ¶
Overview ¶
Package llm runs a LLM locally via llama.cpp, llamafile, or with a python server. It takes care of everything, including fetching gguf packed models from hugging face.
Index ¶
- type Conversation
- type KnownLLM
- type Memory
- type Message
- type Metrics
- type Options
- type PackedFileRef
- func (p PackedFileRef) Author() string
- func (p PackedFileRef) Basename() string
- func (p PackedFileRef) ModelRef() huggingface.ModelRef
- func (p PackedFileRef) Repo() string
- func (p PackedFileRef) RepoID() string
- func (p PackedFileRef) RepoURL() string
- func (p PackedFileRef) Revision() string
- func (p PackedFileRef) Validate() error
- type PackedRepoRef
- type PromptEncoding
- type Role
- type Session
- func (l *Session) Close() error
- func (l *Session) GetHealth(ctx context.Context) (string, error)
- func (l *Session) GetMetrics(ctx context.Context, m *Metrics) error
- func (l *Session) Prompt(ctx context.Context, msgs []Message, maxtoks, seed int, temperature float64) (string, error)
- func (l *Session) PromptStreaming(ctx context.Context, msgs []Message, maxtoks, seed int, temperature float64, ...) error
- type TokenPerformance
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Conversation ¶
type Conversation struct { User string Channel string Started time.Time LastUpdate time.Time Messages []Message // contains filtered or unexported fields }
Conversation is a conversation with one user.
type KnownLLM ¶
type KnownLLM struct { // Source is the repository in the form "hf:<author>/<repo>/<basename>". Source PackedFileRef `yaml:"source"` // PackagingType is the file format used in the model. It can be one of // "safetensors" or "gguf". PackagingType string // Upstream is the upstream repo in the form "hf:<author>/<repo>" when the // model is based on another one. Upstream PackedRepoRef `yaml:"upstream"` // PromptEncoding is only used when using llama-server in /completion mode. // When not present, llama-server is used in OpenAI compatible API mode. PromptEncoding *PromptEncoding `yaml:"prompt_encoding"` // contains filtered or unexported fields }
KnownLLM is a known model.
Currently assumes the model is hosted on HuggingFace.
type Memory ¶
type Memory struct {
// contains filtered or unexported fields
}
Memory holds the bot's conversations.
func (*Memory) Get ¶
func (m *Memory) Get(user, channel string) *Conversation
Get gets a previous conversations or returns a new one if it's a new conversation.
type Metrics ¶
type Metrics struct { Prompt TokenPerformance Generated TokenPerformance KVCacheUsage float64 KVCacheTokens int RequestsProcessing int RequestedPending int }
Metrics represents the metrics for the LLM server.
type Options ¶
type Options struct { // Remote is the host:port of a pre-existing server to use instead of // starting our own. Remote string // Model specifies a model to use. // // It will be selected automatically from KnownLLMs. // // Use "python" to use the integrated python backend. Model PackedFileRef // ContextLength will limit the context length. This is useful with the newer // 128K context window models that will require too much memory and quite // slow to run. A good value to recommend is 8192 or 32768. ContextLength int `yaml:"context_length"` // contains filtered or unexported fields }
Options for NewLLM.
type PackedFileRef ¶
type PackedFileRef string
PackedFileRef is a packed reference to a file in an hugging face repository.
The form is "hf:<author>/<repo>/HEAD/<file>"
HEAD is the git commit reference or "revision". HEAD means the default branch. It can be replaced with a branch name or a commit hash. The default branch used by huggingface_hub official python library is "main".
DEFAULT_REVISION in https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/constants.py
func MakePackedFileRef ¶
func MakePackedFileRef(author, repo, revision, file string) PackedFileRef
MakePackedFileRef returns a PackedFileRef
func (PackedFileRef) Author ¶
func (p PackedFileRef) Author() string
Author returns the <author> part of the packed reference.
func (PackedFileRef) Basename ¶
func (p PackedFileRef) Basename() string
Basename returns the basename part of this reference.
func (PackedFileRef) ModelRef ¶
func (p PackedFileRef) ModelRef() huggingface.ModelRef
ModelRef returns the ModelRef reference to the repo containing this file.
func (PackedFileRef) Repo ¶
func (p PackedFileRef) Repo() string
Repo returns the <repo> part of the packed reference.
func (PackedFileRef) RepoID ¶
func (p PackedFileRef) RepoID() string
RepoID returns the canonical "<author>/<repo>" for this repository.
func (PackedFileRef) RepoURL ¶
func (p PackedFileRef) RepoURL() string
RepoURL returns the canonical URL for this repository.
func (PackedFileRef) Revision ¶
func (p PackedFileRef) Revision() string
Revision returns the HEAD part of the packed reference.
func (PackedFileRef) Validate ¶
func (p PackedFileRef) Validate() error
Validate checks for obvious errors in the string.
type PackedRepoRef ¶
type PackedRepoRef string
PackedRepoRef is a packed reference to an hugging face repository.
The form is "hf:<author>/<repo>"
func (PackedRepoRef) ModelRef ¶
func (p PackedRepoRef) ModelRef() huggingface.ModelRef
ModelRef converts to a ModelRef reference.
func (PackedRepoRef) RepoID ¶
func (p PackedRepoRef) RepoID() string
RepoID returns the canonical "<author>/<repo>" for this repository.
func (PackedRepoRef) RepoURL ¶
func (p PackedRepoRef) RepoURL() string
RepoURL returns the canonical URL for this repository.
func (PackedRepoRef) Validate ¶
func (p PackedRepoRef) Validate() error
Validate checks for obvious errors in the string.
type PromptEncoding ¶
type PromptEncoding struct { // Prompt encoding. BeginOfText string `yaml:"begin_of_text"` SystemTokenStart string `yaml:"system_token_start"` SystemTokenEnd string `yaml:"system_token_end"` UserTokenStart string `yaml:"user_token_start"` UserTokenEnd string `yaml:"user_token_end"` AssistantTokenStart string `yaml:"assistant_token_start"` AssistantTokenEnd string `yaml:"assistant_token_end"` ToolsAvailableTokenStart string `yaml:"tools_available_token_start"` ToolsAvailableTokenEnd string `yaml:"tools_available_token_end"` ToolCallTokenStart string `yaml:"tool_call_token_start"` ToolCallTokenEnd string `yaml:"tool_call_token_end"` ToolCallResultTokenStart string `yaml:"tool_call_result_token_start"` ToolCallResultTokenEnd string `yaml:"tool_call_result_token_end"` // contains filtered or unexported fields }
PromptEncoding describes how to encode the prompt.
type Session ¶
type Session struct { HF *huggingface.Client Model PackedFileRef Encoding *PromptEncoding // contains filtered or unexported fields }
Session runs a llama.cpp or llamafile server and runs queries on it.
While it is expected that the model is an Instruct form, it is not a requirement.
func (*Session) GetMetrics ¶
GetMetrics retrieves the performance statistics from the server.
func (*Session) Prompt ¶
func (l *Session) Prompt(ctx context.Context, msgs []Message, maxtoks, seed int, temperature float64) (string, error)
Prompt prompts the LLM and returns the reply.
See PromptStreaming for the arguments values.
The first message is assumed to be the system prompt.
func (*Session) PromptStreaming ¶
func (l *Session) PromptStreaming(ctx context.Context, msgs []Message, maxtoks, seed int, temperature float64, words chan<- string) error
PromptStreaming prompts the LLM and returns the reply in the supplied channel.
Use a non-zero seed to get deterministic output (without strong guarantees).
Use low temperature (<1.0) to get more deterministic and repetitive output.
Use high temperature (>1.0) to get more creative and random text. High values can result in nonsensical responses.
It is recommended to use 1.0 by default, except some models (like Mistral-Nemo) requires much lower value <=0.3.
The first message is assumed to be the system prompt.
type TokenPerformance ¶
TokenPerformance is the performance for the metrics
func (*TokenPerformance) Rate ¶
func (t *TokenPerformance) Rate() float64
Rate is the number of token per second.