Documentation ¶
Index ¶
- Constants
- func Colorize(format string, opts ...interface{}) (n int, err error)
- func Eval(lctx *Context, vocab *ml.Vocab, model *Model, tokens []uint32, ...) error
- func ExtractTokens(r *ring.Ring, count int) []uint32
- func Resize(slice []float32, size int) []float32
- func ResizeInplace(slice *[]float32, size int)
- func SampleTopPTopK(logits []float32, lastNTokens *ring.Ring, lastNTokensSize uint32, topK uint32, ...) uint32
- type Context
- type ContextParams
- type HParams
- type KVCache
- type Layer
- type Model
- type ModelParams
- type ModelType
Constants ¶
const ( LLAMA_FILE_VERSION = 1 LLAMA_FILE_MAGIC = 0x67676a74 // 'ggjt' in hex LLAMA_FILE_MAGIC_OLD = 0x67676d66 // 'ggmf' in hex LLAMA_FILE_MAGIC_UNVERSIONED = 0x67676d6c // 'ggml' pre-versioned files )
Variables ¶
This section is empty.
Functions ¶
func Eval ¶
func Eval( lctx *Context, vocab *ml.Vocab, model *Model, tokens []uint32, pastCount uint32, params *ModelParams, ) error
Eval runs one inference iteration over the LLaMA model lctx = model context with all LLaMA data tokens = new batch of tokens to process pastCount = the context size so far params = all other parameters like max threads allowed, etc
func ExtractTokens ¶
ExtractTokens is a function to extract a slice of tokens from the ring buffer
func Resize ¶
Resize() (safe) for using instead of C++ std::vector:resize() https://go.dev/play/p/VlQ7N75E5AD
func ResizeInplace ¶
NB! This do not clear the underlying array when resizing https://go.dev/play/p/DbK4dFqwrZn
func SampleTopPTopK ¶
func SampleTopPTopK( logits []float32, lastNTokens *ring.Ring, lastNTokensSize uint32, topK uint32, topP float32, temp float32, repeatPenalty float32, ) uint32
SampleTopPTopK samples next token given probabilities for each embedding:
- consider only the top K tokens
- from them, consider only the top tokens with cumulative probability > P
Types ¶
type Context ¶
type Context struct { Logits []float32 // decode output 2D array [tokensCount][vocabSize] Embedding []float32 // input embedding 1D array [embdSize] MLContext *ml.Context // contains filtered or unexported fields }
Context is the context of the model.
func NewContext ¶
func NewContext(model *Model, params *ModelParams) *Context
NewContext creates a new context.
func (*Context) ReleaseContext ¶ added in v1.4.0
func (ctx *Context) ReleaseContext()
type ContextParams ¶
type ContextParams struct { CtxSize uint32 // text context PartsCount int // -1 for default Seed int // RNG seed, 0 for random LogitsAll bool // the llama_eval() call computes all logits, not just the last one VocabOnly bool // only load the vocabulary, no weights UseLock bool // force system to keep model in RAM Embedding bool // embedding mode only }
ContextParams are the parameters for the context. struct llama_context_params {
type HParams ¶
type HParams struct {
// contains filtered or unexported fields
}
HParams are the hyperparameters of the model (LLaMA-7B commented as example).
type KVCache ¶
type KVCache struct { K *ml.Tensor V *ml.Tensor N uint32 // number of tokens currently in the cache }
KVCache is a key-value cache for the self attention.
type Layer ¶
type Layer struct {
// contains filtered or unexported fields
}
Layer is a single layer of the model.
type Model ¶
type Model struct { Type ModelType // contains filtered or unexported fields }
Model is the representation of any NN model (and LLaMA too).
func LoadModel ¶
LoadModel loads a model's weights from a file See convert-pth-to-ggml.py for details on format func LoadModel(fileName string, params ModelParams, silent bool) (*Context, error) {
func NewModel ¶
func NewModel(params *ModelParams) *Model
NewModel creates a new model with default hyperparameters.
type ModelParams ¶ added in v1.2.0
type ModelParams struct { Model string // model path Prompt string MaxThreads int UseAVX bool UseNEON bool Seed int PredictCount uint32 // new tokens to predict RepeatLastN uint32 // last n tokens to penalize PartsCount int // amount of model parts (-1 = determine from model dimensions) CtxSize uint32 // context size BatchSize uint32 // batch size for prompt processing KeepCount uint32 TopK uint32 // 40 TopP float32 // 0.95 Temp float32 // 0.80 RepeatPenalty float32 // 1.10 InputPrefix string // string to prefix user inputs with Antiprompt []string // string upon seeing which more user input is prompted MemoryFP16 bool // use f16 instead of f32 for memory kv RandomPrompt bool // do not randomize prompt if none provided UseColor bool // use color to distinguish generations and inputs Interactive bool // interactive mode Embedding bool // get only sentence embedding InteractiveStart bool // wait for user input immediately Instruct bool // instruction mode (used for Alpaca models) IgnoreEOS bool // do not stop generating after eos Perplexity bool // compute perplexity over the prompt UseMLock bool // use mlock to keep model in memory MemTest bool // compute maximum memory usage VerbosePrompt bool }