gguf_parser

package module

v0.5.5 Latest Latest Go to latest Published: Aug 6, 2024 License: MIT Imports: 27 Imported by: 4

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gpustack/gguf-parser-go

Links

Open Source Insights

README ¶

GGUF Parser

tl;dr, Go parser for the GGUF.

GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.

GGUF Parser provides some functions to parse the GGUF file in Go for the following purposes:

Read metadata from the GGUF file without downloading the whole model remotely.
Estimate the model usage.

Import the package as below.

go get github.com/gpustack/gguf-parser-go

If you need one-shot command-line, try gguf-parser from releases or go install github.com/gpustack/gguf-parser-go/cmd/gguf-parser from HEAD.

Calls

flowchart
    parseGGUFFileRemote[/parseGGUFFileRemote/]
    parseGGUFFile[/parseGGUFFile/]
    ParseGGUFFile -.-> parseGGUFFile
    ParseGGUFFileFromHuggingFace -.-> ParseGGUFFileRemote
    ParseGGUFFileFromModelScope -.-> ParseGGUFFileRemote
    ParseGGUFFileRemote -.-> parseGGUFFileRemote
    parseGGUFFileRemote -.-> parseGGUFFile
    ParseGGUFFileFromOllama -.-> ParseGGUFFileFromOllamaModel
    ParseGGUFFileFromOllamaModel -.-> parseGGUFFileRemote

Examples

Load model

import (
    "github.com/davecgh/go-spew/spew"
    . "github.com/gpustack/gguf-parser-go"
)

f, err := ParseGGUFFile("path/to/model.gguf")
if err != nil {
    panic(err)
}

spew.Dump(f)

Use MMap

f, err := ParseGGUFFile("path/to/model.gguf", UseMMap())
if err != nil {
    panic(err)
}

Skip large metadata

f, err := ParseGGUFFile("path/to/model.gguf", SkipLargeMetadata())
if err != nil {
    panic(err)
}

Load model from remote

import (
    "context"
    "github.com/davecgh/go-spew/spew"
    . "github.com/gpustack/gguf-parser-go"
)

f, err := ParseGGUFFileRemote(context.Background(), "https://example.com/model.gguf")
if err != nil {
    panic(err)
}

spew.Dump(f)

Adjust requesting buffer size

f, err := ParseGGUFFileRemote(context.Background(), "https://example.com/model.gguf", UseBufferSize(1 * 1024 * 1024) /* 1M */)
if err != nil {
    panic(err)
}

View information

// Model
spew.Dump(f.Model())

// Architecture
spew.Dump(f.Architecture())

// Tokenizer
spew.Dump(f.Tokenizer())

Estimate usage in llama.cpp

The evaluation result is close to those run with llama-cli(examples/main/main.cpp).

es := f.EstimateLLaMACppUsage()
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

Estimate with larger prompt

es := f.EstimateLLaMACppUsage(WithContextSize(4096) /* Use 4k context */))
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

Estimate with specific offload layers

es := f.EstimateLLaMACppUsage(WithOffloadLayers(10) /* Offload last 10 layers to GPU */))
spew.Dump(es)

// Since the estimated result is detail and lack of context,
// you can summarize the result as below.
s := es.Summarize(true /* load via mmap */, 0, 0 /* no unified memory RAM, VRAM footprint */)
spew.Dump(s)

License

MIT

Documentation ¶

Index ¶

Constants
Variables
func GGMLComputationGraphOverhead(nodes uint64, grads bool) uint64
func GGMLHashSize(base uint64) uint64
func GGMLMemoryPadding(size uint64) uint64
func GGMLPadding(size, align uint64) uint64
func GGMLTensorOverhead() uint64
func ValueNumeric[T constraints.Integer | constraints.Float](kv GGUFMetadataKV) T
func ValuesNumeric[T constraints.Integer | constraints.Float](av GGUFMetadataKVArrayValue) []T
type GGMLType
- func (t GGMLType) RowSizeOf(dimensions []uint64) uint64
- func (i GGMLType) String() string
- func (t GGMLType) Trait() (GGMLTypeTrait, bool)
type GGMLTypeTrait
type GGUFArchitectureMetadata
type GGUFBitsPerWeightScalar
- func (s GGUFBitsPerWeightScalar) String() string
type GGUFBytesScalar
- func (s GGUFBytesScalar) String() string
type GGUFFile
- func ParseGGUFFile(path string, opts ...GGUFReadOption) (*GGUFFile, error)
- func ParseGGUFFileFromHuggingFace(ctx context.Context, repo, file string, opts ...GGUFReadOption) (*GGUFFile, error)
- func ParseGGUFFileFromModelScope(ctx context.Context, repo, file string, opts ...GGUFReadOption) (*GGUFFile, error)
- func ParseGGUFFileFromOllama(ctx context.Context, model string, opts ...GGUFReadOption) (*GGUFFile, error)
- func ParseGGUFFileFromOllamaModel(ctx context.Context, model *OllamaModel, opts ...GGUFReadOption) (gf *GGUFFile, err error)
- func ParseGGUFFileRemote(ctx context.Context, url string, opts ...GGUFReadOption) (gf *GGUFFile, err error)
- func (gf *GGUFFile) Architecture() (ga GGUFArchitectureMetadata)
- func (gf *GGUFFile) EstimateLLaMACppUsage(opts ...LLaMACppUsageEstimateOption) (e LLaMACppUsageEstimate)
- func (gf *GGUFFile) Layers(ignores ...string) GGUFLayerTensorInfos
- func (gf *GGUFFile) Model() (gm GGUFModelMetadata)
- func (gf *GGUFFile) Tokenizer() (gt GGUFTokenizerMetadata)
type GGUFFileCache
- func (c GGUFFileCache) Delete(key string) error
- func (c GGUFFileCache) Get(key string, exp time.Duration) (*GGUFFile, error)
- func (c GGUFFileCache) Put(key string, gf *GGUFFile) error
type GGUFFileType
- func (t GGUFFileType) GGMLType() GGMLType
- func (i GGUFFileType) String() string
type GGUFFilename
- func ParseGGUFFilename(name string) *GGUFFilename
- func (gn GGUFFilename) IsPreRelease() bool
- func (gn GGUFFilename) IsSharding() bool
- func (gn GGUFFilename) String() string
type GGUFHeader
type GGUFLayerTensorInfos
- func (ltis GGUFLayerTensorInfos) Bytes() uint64
- func (ltis GGUFLayerTensorInfos) Count() uint64
- func (ltis GGUFLayerTensorInfos) Cut(names []string) (before, after GGUFLayerTensorInfos, found bool)
- func (ltis GGUFLayerTensorInfos) Elements() uint64
- func (ltis GGUFLayerTensorInfos) Get(name string) (info GGUFTensorInfo, found bool)
- func (ltis GGUFLayerTensorInfos) Index(names []string) (infos map[string]GGUFTensorInfo, found int)
- func (ltis GGUFLayerTensorInfos) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)
type GGUFMagic
- func (i GGUFMagic) String() string
type GGUFMetadataKV
- func (kv GGUFMetadataKV) ValueArray() GGUFMetadataKVArrayValue
- func (kv GGUFMetadataKV) ValueBool() bool
- func (kv GGUFMetadataKV) ValueFloat32() float32
- func (kv GGUFMetadataKV) ValueFloat64() float64
- func (kv GGUFMetadataKV) ValueInt16() int16
- func (kv GGUFMetadataKV) ValueInt32() int32
- func (kv GGUFMetadataKV) ValueInt64() int64
- func (kv GGUFMetadataKV) ValueInt8() int8
- func (kv GGUFMetadataKV) ValueString() string
- func (kv GGUFMetadataKV) ValueUint16() uint16
- func (kv GGUFMetadataKV) ValueUint32() uint32
- func (kv GGUFMetadataKV) ValueUint64() uint64
- func (kv GGUFMetadataKV) ValueUint8() uint8
type GGUFMetadataKVArrayValue
- func (av GGUFMetadataKVArrayValue) ValuesArray() []GGUFMetadataKVArrayValue
- func (av GGUFMetadataKVArrayValue) ValuesBool() []bool
- func (av GGUFMetadataKVArrayValue) ValuesFloat32() []float32
- func (av GGUFMetadataKVArrayValue) ValuesFloat64() []float64
- func (av GGUFMetadataKVArrayValue) ValuesInt16() []int16
- func (av GGUFMetadataKVArrayValue) ValuesInt32() []int32
- func (av GGUFMetadataKVArrayValue) ValuesInt64() []int64
- func (av GGUFMetadataKVArrayValue) ValuesInt8() []int8
- func (av GGUFMetadataKVArrayValue) ValuesString() []string
- func (av GGUFMetadataKVArrayValue) ValuesUint16() []uint16
- func (av GGUFMetadataKVArrayValue) ValuesUint32() []uint32
- func (av GGUFMetadataKVArrayValue) ValuesUint64() []uint64
- func (av GGUFMetadataKVArrayValue) ValuesUint8() []uint8
type GGUFMetadataKVs
- func (kvs GGUFMetadataKVs) Get(key string) (value GGUFMetadataKV, found bool)
- func (kvs GGUFMetadataKVs) Index(keys []string) (values map[string]GGUFMetadataKV, found int)
- func (kvs GGUFMetadataKVs) Search(keyRegex *regexp.Regexp) (values []GGUFMetadataKV)
type GGUFMetadataValueType
- func (i GGUFMetadataValueType) String() string
type GGUFModelMetadata
type GGUFNamedTensorInfos
type GGUFParametersScalar
- func (s GGUFParametersScalar) String() string
type GGUFReadOption
- func SkipCache() GGUFReadOption
- func SkipDNSCache() GGUFReadOption
- func SkipLargeMetadata() GGUFReadOption
- func SkipProxy() GGUFReadOption
- func SkipRangeDownloadDetection() GGUFReadOption
- func SkipTLSVerification() GGUFReadOption
- func UseBearerAuth(token string) GGUFReadOption
- func UseBufferSize(size int) GGUFReadOption
- func UseCache() GGUFReadOption
- func UseCacheExpiration(expiration time.Duration) GGUFReadOption
- func UseCachePath(path string) GGUFReadOption
- func UseDebug() GGUFReadOption
- func UseMMap() GGUFReadOption
- func UseProxy(url *url.URL) GGUFReadOption
type GGUFTensorInfo
- func (ti GGUFTensorInfo) Bytes() uint64
- func (ti GGUFTensorInfo) Count() uint64
- func (ti GGUFTensorInfo) Elements() uint64
- func (ti GGUFTensorInfo) Get(name string) (info GGUFTensorInfo, found bool)
- func (ti GGUFTensorInfo) Index(names []string) (infos map[string]GGUFTensorInfo, found int)
- func (ti GGUFTensorInfo) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)
type GGUFTensorInfos
- func (tis GGUFTensorInfos) Bytes() uint64
- func (tis GGUFTensorInfos) Count() uint64
- func (tis GGUFTensorInfos) Elements() uint64
- func (tis GGUFTensorInfos) Get(name string) (info GGUFTensorInfo, found bool)
- func (tis GGUFTensorInfos) Index(names []string) (infos map[string]GGUFTensorInfo, found int)
- func (tis GGUFTensorInfos) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)
type GGUFTokenizerMetadata
type GGUFVersion
- func (i GGUFVersion) String() string
type IGGUFTensorInfos
type LLaMACppComputationUsage
- func (u LLaMACppComputationUsage) Sum() GGUFBytesScalar
type LLaMACppKVCacheUsage
- func (u LLaMACppKVCacheUsage) Sum() GGUFBytesScalar
type LLaMACppMemoryUsage
type LLaMACppUsageEstimate
- func (e LLaMACppUsageEstimate) Summarize(mmap bool, nonUMARamFootprint, nonUMAVramFootprint uint64) (es LLaMACppUsageEstimateSummary)
- func (e LLaMACppUsageEstimate) SummarizeMemory(mmap bool, nonUMARamFootprint, nonUMAVramFootprint uint64) (ems LLaMACppUsageEstimateMemorySummary)
type LLaMACppUsageEstimateMemorySummary
type LLaMACppUsageEstimateOption
- func WithArchitecture(arch GGUFArchitectureMetadata) LLaMACppUsageEstimateOption
- func WithCacheKeyType(t GGMLType) LLaMACppUsageEstimateOption
- func WithCacheValueType(t GGMLType) LLaMACppUsageEstimateOption
- func WithContextSize(size int32) LLaMACppUsageEstimateOption
- func WithDrafter(dft *LLaMACppUsageEstimate) LLaMACppUsageEstimateOption
- func WithFlashAttention() LLaMACppUsageEstimateOption
- func WithLogicalBatchSize(size int32) LLaMACppUsageEstimateOption
- func WithMultimodalProjector(mmp *LLaMACppUsageEstimate) LLaMACppUsageEstimateOption
- func WithOffloadLayers(layers uint64) LLaMACppUsageEstimateOption
- func WithParallelSize(size int32) LLaMACppUsageEstimateOption
- func WithPhysicalBatchSize(size int32) LLaMACppUsageEstimateOption
- func WithTokenizer(tokenizer GGUFTokenizerMetadata) LLaMACppUsageEstimateOption
- func WithinMaxContextSize() LLaMACppUsageEstimateOption
- func WithoutOffloadKVCache() LLaMACppUsageEstimateOption
type LLaMACppUsageEstimateSummary
type LLaMACppWeightUsage
- func (u LLaMACppWeightUsage) Sum() GGUFBytesScalar
type OllamaModel
- func ParseOllamaModel(model string) *OllamaModel
- func (om *OllamaModel) Complete(ctx context.Context, cli *http.Client) error
- func (om *OllamaModel) GetLayer(mediaType string) (OllamaModelLayer, bool)
- func (om *OllamaModel) License(ctx context.Context, cli *http.Client) ([]string, error)
- func (om *OllamaModel) Messages(ctx context.Context, cli *http.Client) ([]json.RawMessage, error)
- func (om *OllamaModel) Params(ctx context.Context, cli *http.Client) (map[string]any, error)
- func (om *OllamaModel) SearchLayers(mediaTypeRegex *regexp.Regexp) []OllamaModelLayer
- func (om *OllamaModel) String() string
- func (om *OllamaModel) System(ctx context.Context, cli *http.Client) (string, error)
- func (om *OllamaModel) Template(ctx context.Context, cli *http.Client) (string, error)
- func (om *OllamaModel) WebPageURL() *url.URL
type OllamaModelLayer
- func (ol *OllamaModelLayer) BlobURL() *url.URL
- func (ol *OllamaModelLayer) FetchBlob(ctx context.Context, cli *http.Client) ([]byte, error)
- func (ol *OllamaModelLayer) FetchBlobFunc(ctx context.Context, cli *http.Client, process func(*http.Response) error) error

Constants ¶

View Source

const (
	// GGMLTensorSize is the size of GGML tensor in bytes,
	// see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/include/ggml/ggml.h#L606.
	GGMLTensorSize = 368

	// GGMLObjectSize is the size of GGML object in bytes,
	// see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/include/ggml/ggml.h#L563.
	GGMLObjectSize = 32
)

GGML tensor constants.

View Source

const (
	// GGMLComputationGraphSize is the size of GGML computation graph in bytes.
	GGMLComputationGraphSize = 80

	// GGMLComputationGraphNodesMaximum is the maximum nodes of the computation graph,
	// see https://github.com/ggerganov/llama.cpp/blob/7672adeec7a79ea271058c63106c142ba84f951a/llama.cpp#L103.
	GGMLComputationGraphNodesMaximum = 8192

	// GGMLComputationGraphNodesDefault is the default nodes of the computation graph,
	// see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/include/ggml/ggml.h#L237.
	GGMLComputationGraphNodesDefault = 2048
)

GGML computation graph constants.

View Source

const (
	OllamaDefaultScheme    = "https"
	OllamaDefaultRegistry  = "ollama.com"
	OllamaDefaultNamespace = "library"
	OllamaDefaultTag       = "latest"
)

Variables ¶

View Source

var (
	ErrGGUFFileCacheDisabled  = errors.New("GGUF file cache disabled")
	ErrGGUFFileCacheMissed    = errors.New("GGUF file cache missed")
	ErrGGUFFileCacheCorrupted = errors.New("GGUF file cache corrupted")
)

View Source

var (
	ErrOllamaInvalidModel      = errors.New("ollama invalid model")
	ErrOllamaBaseLayerNotFound = errors.New("ollama base layer not found")
)

View Source

var ErrGGUFFileInvalidFormat = errors.New("invalid GGUF format")

View Source

var GGUFFilenameRegex = regexp.MustCompile(`^(?P<model_name>[A-Za-z0-9\s-]+)(?:-v(?P<major>\d+)\.(?P<minor>\d+))?-(?:(?P<experts_count>\d+)x)?(?P<parameters>\d+[A-Za-z]?)-(?P<encoding_scheme>[\w_]+)(?:-(?P<shard>\d{5})-of-(?P<shardTotal>\d{5}))?\.gguf$`) // nolint:lll

View Source

var InMiBytes bool

Functions ¶

func GGMLComputationGraphOverhead ¶

func GGMLComputationGraphOverhead(nodes uint64, grads bool) uint64

GGMLComputationGraphOverhead is the overhead of GGML graph in bytes, see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/src/ggml.c#L18905-L18917.

func GGMLHashSize ¶

func GGMLHashSize(base uint64) uint64

GGMLHashSize returns the size of the hash table for the given base, see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/src/ggml.c#L17698-L17722.

func GGMLMemoryPadding ¶

func GGMLMemoryPadding(size uint64) uint64

GGMLMemoryPadding returns the padded size of the given size according to GGML memory padding, see https://github.com/ggerganov/ggml/blob/0cbb7c0/include/ggml/ggml.h#L238-L243.

func GGMLPadding ¶

func GGMLPadding(size, align uint64) uint64

GGMLPadding returns the padded size of the given size according to given align, see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/include/ggml/ggml.h#L255.

func GGMLTensorOverhead ¶

func GGMLTensorOverhead() uint64

GGMLTensorOverhead is the overhead of GGML tensor in bytes, see https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/src/ggml.c#L2765-L2767.

func ValueNumeric ¶

func ValueNumeric[T constraints.Integer | constraints.Float](kv GGUFMetadataKV) T

ValueNumeric returns the numeric values of the GGUFMetadataKV, and panics if the value type is not numeric.

ValueNumeric is a generic function, and the type T must be constraints.Integer or constraints.Float.

Compare to the GGUFMetadataKV's Value* functions, ValueNumeric will cast the original value to the target type.

func ValuesNumeric ¶

func ValuesNumeric[T constraints.Integer | constraints.Float](av GGUFMetadataKVArrayValue) []T

ValuesNumeric returns the numeric values of the GGUFMetadataKVArrayValue, and panics if the value type is not numeric.

ValuesNumeric is a generic function, and the type T must be constraints.Integer or constraints.Float.

Compare to the GGUFMetadataKVArrayValue's Value* functions, ValuesNumeric will cast the original value to the target type.

Types ¶

type GGMLType ¶

type GGMLType uint32

GGMLType is a type of GGML tensor, see https://github.com/ggerganov/llama.cpp/blob/278d0e18469aacf505be18ce790a63c7cc31be26/ggml/include/ggml.h#L354-L390.

const (
	GGMLTypeF32 GGMLType = iota
	GGMLTypeF16
	GGMLTypeQ4_0
	GGMLTypeQ4_1
	GGMLTypeQ4_2
	GGMLTypeQ4_3
	GGMLTypeQ5_0
	GGMLTypeQ5_1
	GGMLTypeQ8_0
	GGMLTypeQ8_1
	GGMLTypeQ2_K
	GGMLTypeQ3_K
	GGMLTypeQ4_K
	GGMLTypeQ5_K
	GGMLTypeQ6_K
	GGMLTypeQ8_K
	GGMLTypeIQ2_XXS
	GGMLTypeIQ2_XS
	GGMLTypeIQ3_XXS
	GGMLTypeIQ1_S
	GGMLTypeIQ4_NL
	GGMLTypeIQ3_S
	GGMLTypeIQ2_S
	GGMLTypeIQ4_XS
	GGMLTypeI8
	GGMLTypeI16
	GGMLTypeI32
	GGMLTypeI64
	GGMLTypeF64
	GGMLTypeIQ1_M
	GGMLTypeBF16
	GGMLTypeQ4_0_4_4
	GGMLTypeQ4_0_4_8
	GGMLTypeQ4_0_8_8
)

GGMLType constants.

GGMLTypeQ4_2, GGMLTypeQ4_3 are deprecated.

func (GGMLType) RowSizeOf ¶

func (t GGMLType) RowSizeOf(dimensions []uint64) uint64

RowSizeOf returns the size of the given dimensions according to the GGMLType's GGMLTypeTrait, which is inspired by https://github.com/ggerganov/ggml/blob/0cbb7c0e053f5419cfbebb46fbf4d4ed60182cf5/src/ggml.c#L3142-L3145.

The index of the given dimensions means the number of dimension, i.e. 0 is the first dimension, 1 is the second dimension, and so on.

The value of the item is the number of elements in the corresponding dimension.

func (GGMLType) String ¶

func (i GGMLType) String() string

func (GGMLType) Trait ¶

func (t GGMLType) Trait() (GGMLTypeTrait, bool)

Trait returns the GGMLTypeTrait of the GGMLType.

type GGMLTypeTrait ¶

type GGMLTypeTrait struct {
	BlockSize uint64 // Original is int, in order to reduce conversion, here we use uint64.
	TypeSize  uint64 // Original is uint32, in order to reduce conversion, here we use uint64.
	Quantized bool
}

GGMLTypeTrait holds the trait of a GGMLType, see https://github.com/ggerganov/llama.cpp/blob/278d0e18469aacf505be18ce790a63c7cc31be26/ggml/src/ggml.c#L547-L942.

type GGUFArchitectureMetadata ¶

type GGUFArchitectureMetadata struct {

	// Architecture describes what architecture this model implements.
	//
	// All lowercase ASCII, with only [a-z0-9]+ characters allowed.
	Architecture string `json:"architecture"`
	// MaximumContextLength(n_ctx_train) is the maximum context length of the model.
	//
	// For most architectures, this is the hard limit on the length of the input.
	// Architectures, like RWKV,
	// that are not reliant on transformer-style attention may be able to handle larger inputs,
	// but this is not guaranteed.
	MaximumContextLength uint64 `json:"maximumContextLength"`
	// EmbeddingLength(n_embd) is the length of the embedding layer.
	EmbeddingLength uint64 `json:"embeddingLength"`
	// BlockCount(n_layer) is the number of blocks of attention and feed-forward layers,
	// i.e. the bulk of the LLM.
	// This does not include the input or embedding layers.
	BlockCount uint64 `json:"blockCount"`
	// FeedForwardLength(n_ff) is the length of the feed-forward layer.
	FeedForwardLength uint64 `json:"feedForwardLength,omitempty"`
	// ExpertFeedForwardLength(expert_feed_forward_length) is the length of the feed-forward layer in the expert model.
	ExpertFeedForwardLength uint64 `json:"expertFeedForwardLength,omitempty"`
	// ExpertSharedFeedForwardLength(expert_shared_feed_forward_length) is the length of the shared feed-forward layer in the expert model.
	ExpertSharedFeedForwardLength uint64 `json:"expertSharedFeedForwardLength,omitempty"`
	// ExpertCount(n_expert) is the number of experts in MoE models.
	ExpertCount uint32 `json:"expertCount,omitempty"`
	// ExpertUsedCount(n_expert_used) is the number of experts used during each token evaluation in MoE models.
	ExpertUsedCount uint32 `json:"expertUsedCount,omitempty"`
	// AttentionHeadCount(n_head) is the number of attention heads.
	AttentionHeadCount uint64 `json:"attentionHeadCount,omitempty"`
	// AttentionHeadCountKV(n_head_kv) is the number of attention heads per group used in Grouped-Query-Attention.
	//
	// If not provided or equal to AttentionHeadCount,
	// the model does not use Grouped-Query-Attention.
	AttentionHeadCountKV uint64 `json:"attentionHeadCountKV,omitempty"`
	// AttentionMaxALiBIBias is the maximum bias to use for ALiBI.
	AttentionMaxALiBIBias float32 `json:"attentionMaxALiBIBias,omitempty"`
	// AttentionClampKQV describes a value `C`,
	// which is used to clamp the values of the `Q`, `K` and `V` tensors between `[-C, C]`.
	AttentionClampKQV float32 `json:"attentionClampKQV,omitempty"`
	// AttentionLayerNormEpsilon is the epsilon value used in the LayerNorm(Layer Normalization).
	AttentionLayerNormEpsilon float32 `json:"attentionLayerNormEpsilon,omitempty"`
	// AttentionLayerNormRMSEpsilon is the epsilon value used in the RMSNorm(root Mean Square Layer Normalization),
	// which is a simplification of the original LayerNorm.
	AttentionLayerNormRMSEpsilon float32 `json:"attentionLayerNormRMSEpsilon,omitempty"`
	// AttentionKeyLength(n_embd_head_k) is the size of a key head.
	//
	// Defaults to `EmbeddingLength / AttentionHeadCount`.
	AttentionKeyLength uint32 `json:"attentionKeyLength"`
	// AttentionValueLength(n_embd_head_v) is the size of a value head.
	//
	// Defaults to `EmbeddingLength / AttentionHeadCount`.
	AttentionValueLength uint32 `json:"attentionValueLength"`
	// AttentionCausal is true if the attention is causal.
	AttentionCausal bool `json:"attentionCausal,omitempty"`
	// RoPEDimensionCount is the number of dimensions in the RoPE(Rotary Positional Encoding).
	RoPEDimensionCount uint64 `json:"ropeDimensionCount,omitempty"`
	// RoPEFrequencyBase is the base frequency of the RoPE.
	RoPEFrequencyBase float32 `json:"ropeFrequencyBase,omitempty"`
	// RoPEFrequencyScale is the frequency scale of the RoPE.
	RoPEScalingType string `json:"ropeScalingType,omitempty"`
	// RoPEScalingFactor is the scaling factor of the RoPE.
	RoPEScalingFactor float32 `json:"ropeScalingFactor,omitempty"`
	// RoPEScalingOriginalContextLength is the original context length of the RoPE scaling.
	RoPEScalingOriginalContextLength uint64 `json:"ropeScalingOriginalContextLength,omitempty"`
	// RoPEScalingFinetuned is true if the RoPE scaling is fine-tuned.
	RoPEScalingFinetuned bool `json:"ropeScalingFinetuned,omitempty"`
	// SSMConvolutionKernel is the size of the convolution kernel used in the SSM(Selective State Space Model).
	SSMConvolutionKernel uint32 `json:"ssmConvolutionKernel,omitempty"`
	// SSMInnerSize is the embedding size of the state in SSM.
	SSMInnerSize uint32 `json:"ssmInnerSize,omitempty"`
	// SSMStateSize is the size of the recurrent state in SSM.
	SSMStateSize uint32 `json:"ssmStateSize,omitempty"`
	// SSMTimeStepRank is the rank of the time steps in SSM.
	SSMTimeStepRank uint32 `json:"ssmTimeStepRank,omitempty"`
	// VocabularyLength is the size of the vocabulary.
	//
	// VocabularyLength is the same as the tokenizer's token size.
	VocabularyLength uint64 `json:"vocabularyLength"`

	// EmbeddingGGQA is the GQA of the embedding layer.
	EmbeddingGQA uint64 `json:"embeddingGQA,omitempty"`
	// EmbeddingKeyGQA is the number of key GQA in the embedding layer.
	EmbeddingKeyGQA uint64 `json:"embeddingKeyGQA,omitempty"`
	// EmbeddingValueGQA is the number of value GQA in the embedding layer.
	EmbeddingValueGQA uint64 `json:"embeddingValueGQA,omitempty"`

	// ClipHasTextEncoder indicates whether the clip model has text encoder or not.
	//
	// Only used when Architecture is "clip".
	ClipHasTextEncoder bool `json:"clipHasTextEncoder,omitempty"`
	// ClipHasVisionEncoder indicates whether the clip model has vision encoder or not.
	//
	// Only used when Architecture is "clip".
	ClipHasVisionEncoder bool `json:"clipHasVisionEncoder,omitempty"`
	// ClipHasLLaVaProjector indicates whether the clip model has LLaVa projector or not.
	//
	// Only used when Architecture is "clip".
	ClipHasLLaVaProjector bool `json:"clipHasLLaVaProjector,omitempty"`
	// ClipProjectorType is the type of the projector used in the clip model.
	//
	// Only used when Architecture is "clip".
	ClipProjectorType string `json:"clipProjectorType,omitempty"`
}

GGUFArchitectureMetadata represents the architecture metadata of a GGUF file.

type GGUFBitsPerWeightScalar ¶

type GGUFBitsPerWeightScalar float64

GGUFBitsPerWeightScalar is the scalar for bits per weight.

func (GGUFBitsPerWeightScalar) String ¶

func (s GGUFBitsPerWeightScalar) String() string

type GGUFBytesScalar ¶

type GGUFBytesScalar uint64

GGUFBytesScalar is the scalar for bytes.

func (GGUFBytesScalar) String ¶

func (s GGUFBytesScalar) String() string

type GGUFFile ¶

type GGUFFile struct {

	// Header is the header of the GGUF file.
	Header GGUFHeader `json:"header"`
	// TensorInfos are the tensor infos of the GGUF file,
	// the size of TensorInfos is equal to `Header.TensorCount`.
	TensorInfos GGUFTensorInfos `json:"tensorInfos"`
	// Padding is the padding size of the GGUF file,
	// which is used to split Header and TensorInfos from tensor data.
	Padding int64 `json:"padding"`
	// TensorDataStartOffset is the offset in bytes of the tensor data in this file.
	//
	// The offset is the start of the file.
	TensorDataStartOffset int64 `json:"tensorDataStartOffset"`

	// Size is the size of the GGUF file.
	Size GGUFBytesScalar `json:"size"`
	// ModelSize is the size of the model when loading.
	ModelSize GGUFBytesScalar `json:"modelSize"`
	// ModelParameters is the number of the model parameters.
	ModelParameters GGUFParametersScalar `json:"modelParameters"`
	// ModelBitsPerWeight is the bits per weight of the model,
	// which describes how many bits are used to store a weight,
	// higher is better.
	ModelBitsPerWeight GGUFBitsPerWeightScalar `json:"modelBitsPerWeight"`
}

GGUFFile represents a GGUF file, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#file-structure.

Compared with the complete GGUF file, this structure lacks the tensor data part.

func ParseGGUFFile ¶

func ParseGGUFFile(path string, opts ...GGUFReadOption) (*GGUFFile, error)

ParseGGUFFile parses a GGUF file from the local given path, and returns the GGUFFile, or an error if any.

func ParseGGUFFileFromHuggingFace ¶

func ParseGGUFFileFromHuggingFace(ctx context.Context, repo, file string, opts ...GGUFReadOption) (*GGUFFile, error)

ParseGGUFFileFromHuggingFace parses a GGUF file from Hugging Face(https://huggingface.co/), and returns a GGUFFile, or an error if any.

func ParseGGUFFileFromModelScope ¶

func ParseGGUFFileFromModelScope(ctx context.Context, repo, file string, opts ...GGUFReadOption) (*GGUFFile, error)

ParseGGUFFileFromModelScope parses a GGUF file from Model Scope(https://modelscope.cn/), and returns a GGUFFile, or an error if any.

func ParseGGUFFileFromOllama ¶

func ParseGGUFFileFromOllama(ctx context.Context, model string, opts ...GGUFReadOption) (*GGUFFile, error)

ParseGGUFFileFromOllama parses a GGUF file from Ollama model's base layer, and returns a GGUFFile, or an error if any.

func ParseGGUFFileFromOllamaModel ¶

func ParseGGUFFileFromOllamaModel(ctx context.Context, model *OllamaModel, opts ...GGUFReadOption) (gf *GGUFFile, err error)

ParseGGUFFileFromOllamaModel is similar to ParseGGUFFileFromOllama, but inputs an OllamaModel instead of a string.

The given OllamaModel will be completed(fetching MediaType, Config and Layers) after calling this function.

func ParseGGUFFileRemote ¶

func ParseGGUFFileRemote(ctx context.Context, url string, opts ...GGUFReadOption) (gf *GGUFFile, err error)

ParseGGUFFileRemote parses a GGUF file from a remote BlobURL, and returns a GGUFFile, or an error if any.

func (*GGUFFile) Architecture ¶

func (gf *GGUFFile) Architecture() (ga GGUFArchitectureMetadata)

Architecture returns the architecture metadata of the GGUF file.

func (*GGUFFile) EstimateLLaMACppUsage ¶

func (gf *GGUFFile) EstimateLLaMACppUsage(opts ...LLaMACppUsageEstimateOption) (e LLaMACppUsageEstimate)

EstimateLLaMACppUsage returns the inference memory usage estimated result of the GGUF file.

func (*GGUFFile) Layers ¶

func (gf *GGUFFile) Layers(ignores ...string) GGUFLayerTensorInfos

Layers converts the GGUFTensorInfos to GGUFLayerTensorInfos.

func (*GGUFFile) Model ¶

func (gf *GGUFFile) Model() (gm GGUFModelMetadata)

Model returns the model metadata of the GGUF file.

func (*GGUFFile) Tokenizer ¶

func (gf *GGUFFile) Tokenizer() (gt GGUFTokenizerMetadata)

Tokenizer returns the tokenizer metadata of a GGUF file.

type GGUFFileCache ¶

type GGUFFileCache string

func (GGUFFileCache) Delete ¶

func (c GGUFFileCache) Delete(key string) error

func (GGUFFileCache) Get ¶

func (c GGUFFileCache) Get(key string, exp time.Duration) (*GGUFFile, error)

func (GGUFFileCache) Put ¶

func (c GGUFFileCache) Put(key string, gf *GGUFFile) error

type GGUFFileType ¶

type GGUFFileType uint32

GGUFFileType is a type of GGUF file, see https://github.com/ggerganov/llama.cpp/blob/278d0e18469aacf505be18ce790a63c7cc31be26/ggml/include/ggml.h#L404-L433.

const (
	GGUFFileTypeAllF32         GGUFFileType = iota // F32
	GGUFFileTypeMostlyF16                          // F16
	GGUFFileTypeMostlyQ4_0                         // Q4_0
	GGUFFileTypeMostlyQ4_1                         // Q4_1
	GGUFFileTypeMostlyQ4_1_F16                     // Q4_1_F16
	GGUFFileTypeMostlyQ4_2                         // Q4_2
	GGUFFileTypeMostlyQ4_3                         // Q4_3
	GGUFFileTypeMostlyQ8_0                         // Q8_0
	GGUFFileTypeMostlyQ5_0                         // Q5_0
	GGUFFileTypeMostlyQ5_1                         // Q5_1
	GGUFFileTypeMostlyQ2_K                         // Q2_K
	GGUFFileTypeMostlyQ3_K                         // Q3_K/Q3_K_S
	GGUFFileTypeMostlyQ4_K                         // Q4_K/Q3_K_M
	GGUFFileTypeMostlyQ5_K                         // Q5_K/Q3_K_L
	GGUFFileTypeMostlyQ6_K                         // Q6_K/Q4_K_S
	GGUFFileTypeMostlyIQ2_XXS                      // IQ2_XXS/Q4_K_M
	GGUFFileTypeMostlyIQ2_XS                       // IQ2_XS/Q5_K_S
	GGUFFileTypeMostlyIQ3_XXS                      // IQ3_XXS/Q5_K_M
	GGUFFileTypeMostlyIQ1_S                        // IQ1_S/Q6_K
	GGUFFileTypeMostlyIQ4_NL                       // IQ4_NL
	GGUFFileTypeMostlyIQ3_S                        // IQ3_S
	GGUFFileTypeMostlyIQ2_S                        // IQ2_S
	GGUFFileTypeMostlyIQ4_XS                       // IQ4_XS
	GGUFFileTypeMostlyIQ1_M                        // IQ1_M
	GGUFFileTypeMostlyBF16                         // BF16
	GGUFFileTypeMostlyQ4_0_4_4                     // Q4_0_4x4
	GGUFFileTypeMostlyQ4_0_4_8                     // Q4_0_4x8
	GGUFFileTypeMostlyQ4_0_8_8                     // Q4_0_8x8

)

GGUFFileType constants.

GGUFFileTypeMostlyQ4_2, GGUFFileTypeMostlyQ4_3 are deprecated.

GGUFFileTypeMostlyQ4_1_F16 is a special case where the majority of the tensors are Q4_1, but 'token_embd.weight' and 'output.weight' tensors are F16.

func (GGUFFileType) GGMLType ¶

func (t GGUFFileType) GGMLType() GGMLType

GGMLType returns the GGMLType of the GGUFFileType, which is inspired by https://github.com/ggerganov/ggml/blob/a10a8b880c059b3b29356eb9a9f8df72f03cdb6a/src/ggml.c#L2730-L2763.

func (GGUFFileType) String ¶

func (i GGUFFileType) String() string

type GGUFFilename ¶

type GGUFFilename struct {
	ModelName      string `json:"modelName"`
	Major          *int   `json:"major"`
	Minor          *int   `json:"minor"`
	ExpertsCount   *int   `json:"expertsCount,omitempty"`
	Parameters     string `json:"parameters"`
	EncodingScheme string `json:"encodingScheme"`
	Shard          *int   `json:"shard,omitempty"`
	ShardTotal     *int   `json:"shardTotal,omitempty"`
}

GGUFFilename represents a GGUF filename, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention.

func ParseGGUFFilename ¶

func ParseGGUFFilename(name string) *GGUFFilename

ParseGGUFFilename parses the given GGUF filename string, and returns the GGUFFilename, or nil if the filename is invalid.

func (GGUFFilename) IsPreRelease ¶

func (gn GGUFFilename) IsPreRelease() bool

func (GGUFFilename) IsSharding ¶

func (gn GGUFFilename) IsSharding() bool

func (GGUFFilename) String ¶

func (gn GGUFFilename) String() string

type GGUFHeader ¶

type GGUFHeader struct {
	// Magic is a magic number that announces that this is a GGUF file.
	Magic GGUFMagic `json:"magic"`
	// Version is a version of the GGUF file format.
	Version GGUFVersion `json:"version"`
	// TensorCount is the number of tensors in the file.
	TensorCount uint64 `json:"tensorCount"`
	// MetadataKVCount is the number of key-value pairs in the metadata.
	MetadataKVCount uint64 `json:"metadataKVCount"`
	// MetadataKV are the key-value pairs in the metadata,
	MetadataKV GGUFMetadataKVs `json:"metadataKV"`
}

GGUFHeader represents the header of a GGUF file.

type GGUFLayerTensorInfos ¶

type GGUFLayerTensorInfos []IGGUFTensorInfos

GGUFLayerTensorInfos represents hierarchical tensor infos of a GGUF file, it can save GGUFNamedTensorInfos, GGUFTensorInfos, and GGUFTensorInfo.

func (GGUFLayerTensorInfos) Bytes ¶

func (ltis GGUFLayerTensorInfos) Bytes() uint64

Bytes returns the number of bytes of the GGUFLayerTensorInfos.

func (GGUFLayerTensorInfos) Count ¶

func (ltis GGUFLayerTensorInfos) Count() uint64

Count returns the number of GGUF tensors of the GGUFLayerTensorInfos.

func (GGUFLayerTensorInfos) Cut ¶

func (ltis GGUFLayerTensorInfos) Cut(names []string) (before, after GGUFLayerTensorInfos, found bool)

Cut splits the GGUFLayerTensorInfos into two parts, and returns the GGUFLayerTensorInfos with the names that match the given names at first, and the GGUFLayerTensorInfos without the names at second, and true if the GGUFLayerTensorInfos with the names are found, and false otherwise.

func (GGUFLayerTensorInfos) Elements ¶

func (ltis GGUFLayerTensorInfos) Elements() uint64

Elements returns the number of elements of the GGUFLayerTensorInfos.

func (GGUFLayerTensorInfos) Get ¶

func (ltis GGUFLayerTensorInfos) Get(name string) (info GGUFTensorInfo, found bool)

Get returns the IGGUFTensorInfos with the given name, and true if found, and false otherwise.

func (GGUFLayerTensorInfos) Index ¶

func (ltis GGUFLayerTensorInfos) Index(names []string) (infos map[string]GGUFTensorInfo, found int)

Index returns a map value to the GGUFTensorInfos with the given names, and the number of names found.

func (GGUFLayerTensorInfos) Search ¶

func (ltis GGUFLayerTensorInfos) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)

Search returns a list of GGUFTensorInfo with the names that match the given regex.

type GGUFMagic ¶

type GGUFMagic uint32

GGUFMagic is a magic number of GGUF file, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#historical-state-of-affairs.

const (
	GGUFMagicGGML   GGUFMagic = 0x67676d6c
	GGUFMagicGGMF   GGUFMagic = 0x67676d66
	GGUFMagicGGJT   GGUFMagic = 0x67676a74
	GGUFMagicGGUFLe GGUFMagic = 0x46554747 // GGUF
	GGUFMagicGGUFBe GGUFMagic = 0x47475546 // GGUF
)

GGUFMagic constants.

func (GGUFMagic) String ¶

func (i GGUFMagic) String() string

type GGUFMetadataKV ¶

type GGUFMetadataKV struct {
	// Key is the key of the metadata key-value pair,
	// which is no larger than 64 bytes long.
	Key string `json:"key"`
	// ValueType is the type of the metadata value.
	ValueType GGUFMetadataValueType `json:"valueType"`
	// Value is the value of the metadata key-value pair.
	Value any `json:"value"`
}

GGUFMetadataKV is a key-value pair in the metadata of a GGUF file.

func (GGUFMetadataKV) ValueArray ¶

func (kv GGUFMetadataKV) ValueArray() GGUFMetadataKVArrayValue

func (GGUFMetadataKV) ValueBool ¶

func (kv GGUFMetadataKV) ValueBool() bool

func (GGUFMetadataKV) ValueFloat32 ¶

func (kv GGUFMetadataKV) ValueFloat32() float32

func (GGUFMetadataKV) ValueFloat64 ¶

func (kv GGUFMetadataKV) ValueFloat64() float64

func (GGUFMetadataKV) ValueInt16 ¶

func (kv GGUFMetadataKV) ValueInt16() int16

func (GGUFMetadataKV) ValueInt32 ¶

func (kv GGUFMetadataKV) ValueInt32() int32

func (GGUFMetadataKV) ValueInt64 ¶

func (kv GGUFMetadataKV) ValueInt64() int64

func (GGUFMetadataKV) ValueInt8 ¶

func (kv GGUFMetadataKV) ValueInt8() int8

func (GGUFMetadataKV) ValueString ¶

func (kv GGUFMetadataKV) ValueString() string

func (GGUFMetadataKV) ValueUint16 ¶

func (kv GGUFMetadataKV) ValueUint16() uint16

func (GGUFMetadataKV) ValueUint32 ¶

func (kv GGUFMetadataKV) ValueUint32() uint32

func (GGUFMetadataKV) ValueUint64 ¶

func (kv GGUFMetadataKV) ValueUint64() uint64

func (GGUFMetadataKV) ValueUint8 ¶

func (kv GGUFMetadataKV) ValueUint8() uint8

type GGUFMetadataKVArrayValue ¶

type GGUFMetadataKVArrayValue struct {

	// Type is the type of the array item.
	Type GGUFMetadataValueType `json:"type"`
	// Len is the length of the array.
	Len uint64 `json:"len"`
	// Array holds all array items.
	Array []any `json:"array,omitempty"`

	// StartOffset is the offset in bytes of the GGUFMetadataKVArrayValue in the GGUFFile file.
	//
	// The offset is the start of the file.
	StartOffset int64 `json:"startOffset"`

	// Size is the size of the array in bytes.
	Size int64 `json:"size"`
}

GGUFMetadataKVArrayValue is a value of a GGUFMetadataKV with type GGUFMetadataValueTypeArray.

func (GGUFMetadataKVArrayValue) ValuesArray ¶

func (av GGUFMetadataKVArrayValue) ValuesArray() []GGUFMetadataKVArrayValue

func (GGUFMetadataKVArrayValue) ValuesBool ¶

func (av GGUFMetadataKVArrayValue) ValuesBool() []bool

func (GGUFMetadataKVArrayValue) ValuesFloat32 ¶

func (av GGUFMetadataKVArrayValue) ValuesFloat32() []float32

func (GGUFMetadataKVArrayValue) ValuesFloat64 ¶

func (av GGUFMetadataKVArrayValue) ValuesFloat64() []float64

func (GGUFMetadataKVArrayValue) ValuesInt16 ¶

func (av GGUFMetadataKVArrayValue) ValuesInt16() []int16

func (GGUFMetadataKVArrayValue) ValuesInt32 ¶

func (av GGUFMetadataKVArrayValue) ValuesInt32() []int32

func (GGUFMetadataKVArrayValue) ValuesInt64 ¶

func (av GGUFMetadataKVArrayValue) ValuesInt64() []int64

func (GGUFMetadataKVArrayValue) ValuesInt8 ¶

func (av GGUFMetadataKVArrayValue) ValuesInt8() []int8

func (GGUFMetadataKVArrayValue) ValuesString ¶

func (av GGUFMetadataKVArrayValue) ValuesString() []string

func (GGUFMetadataKVArrayValue) ValuesUint16 ¶

func (av GGUFMetadataKVArrayValue) ValuesUint16() []uint16

func (GGUFMetadataKVArrayValue) ValuesUint32 ¶

func (av GGUFMetadataKVArrayValue) ValuesUint32() []uint32

func (GGUFMetadataKVArrayValue) ValuesUint64 ¶

func (av GGUFMetadataKVArrayValue) ValuesUint64() []uint64

func (GGUFMetadataKVArrayValue) ValuesUint8 ¶

func (av GGUFMetadataKVArrayValue) ValuesUint8() []uint8

type GGUFMetadataKVs ¶

type GGUFMetadataKVs []GGUFMetadataKV

GGUFMetadataKVs is a list of GGUFMetadataKV.

func (GGUFMetadataKVs) Get ¶

func (kvs GGUFMetadataKVs) Get(key string) (value GGUFMetadataKV, found bool)

Get returns the GGUFMetadataKV with the given key, and true if found, and false otherwise.

func (GGUFMetadataKVs) Index ¶

func (kvs GGUFMetadataKVs) Index(keys []string) (values map[string]GGUFMetadataKV, found int)

Index returns a map value to the GGUFMetadataKVs with the given keys, and the number of keys found.

func (GGUFMetadataKVs) Search ¶

func (kvs GGUFMetadataKVs) Search(keyRegex *regexp.Regexp) (values []GGUFMetadataKV)

Search returns a list of GGUFMetadataKV with the keys that match the given regex.

type GGUFMetadataValueType ¶

type GGUFMetadataValueType uint32

GGUFMetadataValueType is a type of GGUF metadata value, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#file-structure.

const (
	GGUFMetadataValueTypeUint8 GGUFMetadataValueType = iota
	GGUFMetadataValueTypeInt8
	GGUFMetadataValueTypeUint16
	GGUFMetadataValueTypeInt16
	GGUFMetadataValueTypeUint32
	GGUFMetadataValueTypeInt32
	GGUFMetadataValueTypeFloat32
	GGUFMetadataValueTypeBool
	GGUFMetadataValueTypeString
	GGUFMetadataValueTypeArray
	GGUFMetadataValueTypeUint64
	GGUFMetadataValueTypeInt64
	GGUFMetadataValueTypeFloat64
)

GGUFMetadataValueType constants.

func (GGUFMetadataValueType) String ¶

func (i GGUFMetadataValueType) String() string

type GGUFModelMetadata ¶

type GGUFModelMetadata struct {

	// Architecture describes what architecture this model implements.
	//
	// All lowercase ASCII, with only [a-z0-9]+ characters allowed.
	Architecture string `json:"architecture"`
	// QuantizationVersion describes the version of the quantization format.
	//
	// Not required if the model is not quantized (i.e. no tensors are quantized).
	// If any tensors are quantized, this must be present.
	// This is separate to the quantization scheme of the tensors itself,
	// the quantization version may change without changing the scheme's name,
	// e.g. the quantization scheme is Q5_K, and the QuantizationVersion is 4.
	QuantizationVersion uint32 `json:"quantizationVersion,omitempty"`
	// Alignment describes the alignment of the GGUF file.
	//
	// This can vary to allow for different alignment schemes, but it must be a multiple of 8.
	// Some writers may not write the alignment.
	//
	// Default is 32.
	Alignment uint32 `json:"alignment"`
	// Name to the model.
	//
	// This should be a human-readable name that can be used to identify the model.
	// It should be unique within the community that the model is defined in.
	Name string `json:"name"`
	// Author to the model.
	Author string `json:"author,omitempty"`
	// URL to the model's homepage.
	//
	// This can be a GitHub repo, a paper, etc.
	URL string `json:"url,omitempty"`
	// Description to the model.
	Description string `json:"description,omitempty"`
	// License to the model.
	//
	// This is expressed as a SPDX license expression, e.g. "MIT OR Apache-2.0".
	License string `json:"license,omitempty"`
	// FileType describes the type of the majority of the tensors in the GGUF file.
	FileType GGUFFileType `json:"fileType"`

	// LittleEndian is true if the GGUF file is little-endian,
	// and false for big-endian.
	LittleEndian bool `json:"littleEndian"`
	// FileSize is the size of the GGUF file in bytes.
	FileSize GGUFBytesScalar `json:"fileSize"`
	// Size is the model size.
	Size GGUFBytesScalar `json:"size"`
	// Parameters is the parameters of the model.
	Parameters GGUFParametersScalar `json:"parameters"`
	// BitsPerWeight is the bits per weight of the model.
	BitsPerWeight GGUFBitsPerWeightScalar `json:"bitsPerWeight"`
}

GGUFModelMetadata represents the model metadata of a GGUF file.

type GGUFNamedTensorInfos ¶

type GGUFNamedTensorInfos struct {
	// Name is the name of the namespace.
	Name string `json:"name"`
	// GGUFLayerTensorInfos can save GGUFNamedTensorInfos, GGUFTensorInfos, or GGUFTensorInfo.
	//
	// If the item is type of GGUFTensorInfo, it must be the leaf node.
	//
	// Any branch nodes are type of GGUFNamedTensorInfos or GGUFTensorInfos,
	// which can be nested.
	//
	// Branch nodes store in type pointer.
	GGUFLayerTensorInfos `json:"items,omitempty"`
}

GGUFNamedTensorInfos is the namespace for relevant tensors, which must has a name.

type GGUFParametersScalar ¶

type GGUFParametersScalar uint64

GGUFParametersScalar is the scalar for parameters.

func (GGUFParametersScalar) String ¶

func (s GGUFParametersScalar) String() string

type GGUFReadOption ¶

type GGUFReadOption func(o *_GGUFReadOptions)

func SkipCache ¶

func SkipCache() GGUFReadOption

SkipCache skips the cache when reading from remote.

func SkipDNSCache ¶

func SkipDNSCache() GGUFReadOption

SkipDNSCache skips the DNS cache when reading from remote.

func SkipLargeMetadata ¶

func SkipLargeMetadata() GGUFReadOption

SkipLargeMetadata skips reading large GGUFMetadataKV items, which are not necessary for most cases.

func SkipProxy ¶

func SkipProxy() GGUFReadOption

SkipProxy skips the proxy when reading from remote.

func SkipRangeDownloadDetection ¶

func SkipRangeDownloadDetection() GGUFReadOption

SkipRangeDownloadDetection skips the range download detection when reading from remote.

func SkipTLSVerification ¶

func SkipTLSVerification() GGUFReadOption

SkipTLSVerification skips the TLS verification when reading from remote.

func UseBearerAuth ¶

func UseBearerAuth(token string) GGUFReadOption

UseBearerAuth uses the given token as a bearer auth when reading from remote.

func UseBufferSize ¶

func UseBufferSize(size int) GGUFReadOption

UseBufferSize sets the buffer size when reading from remote.

func UseCache ¶

func UseCache() GGUFReadOption

UseCache caches the remote reading result.

func UseCacheExpiration ¶

func UseCacheExpiration(expiration time.Duration) GGUFReadOption

UseCacheExpiration uses the given expiration to cache the remote reading result.

Disable cache expiration by setting it to 0.

func UseCachePath ¶

func UseCachePath(path string) GGUFReadOption

UseCachePath uses the given path to cache the remote reading result.

func UseDebug ¶

func UseDebug() GGUFReadOption

UseDebug uses debug mode to read the file.

func UseMMap ¶

func UseMMap() GGUFReadOption

UseMMap uses mmap to read the local file.

func UseProxy ¶

func UseProxy(url *url.URL) GGUFReadOption

UseProxy uses the given url as a proxy when reading from remote.

type GGUFTensorInfo ¶

type GGUFTensorInfo struct {

	// Name is the name of the tensor,
	// which is no larger than 64 bytes long.
	Name string `json:"name"`
	// NDimensions is the number of dimensions of the tensor.
	NDimensions uint32 `json:"nDimensions"`
	// Dimensions is the dimensions of the tensor,
	// the length is NDimensions.
	Dimensions []uint64 `json:"dimensions"`
	// Type is the type of the tensor.
	Type GGMLType `json:"type"`
	// Offset is the offset in bytes of the tensor's data in this file.
	//
	// The offset is relative to tensor data, not to the start of the file.
	Offset uint64 `json:"offset"`

	// StartOffset is the offset in bytes of the GGUFTensorInfo in the GGUFFile file.
	//
	// The offset is the start of the file.
	StartOffset int64 `json:"startOffset"`
}

GGUFTensorInfo represents a tensor info in a GGUF file.

func (GGUFTensorInfo) Bytes ¶

func (ti GGUFTensorInfo) Bytes() uint64

Bytes returns the number of bytes of the GGUFTensorInfo, which is inspired by https://github.com/ggerganov/ggml/blob/a10a8b880c059b3b29356eb9a9f8df72f03cdb6a/src/ggml.c#L2609-L2626.

func (GGUFTensorInfo) Count ¶

func (ti GGUFTensorInfo) Count() uint64

Count returns the number of GGUF tensors of the GGUFTensorInfo, which is always 1.

func (GGUFTensorInfo) Elements ¶

func (ti GGUFTensorInfo) Elements() uint64

Elements returns the number of elements of the GGUFTensorInfo, which is inspired by https://github.com/ggerganov/ggml/blob/a10a8b880c059b3b29356eb9a9f8df72f03cdb6a/src/ggml.c#L2597-L2601.

func (GGUFTensorInfo) Get ¶

func (ti GGUFTensorInfo) Get(name string) (info GGUFTensorInfo, found bool)

Get returns the GGUFTensorInfo with the given name, and true if found, and false otherwise.

func (GGUFTensorInfo) Index ¶

func (ti GGUFTensorInfo) Index(names []string) (infos map[string]GGUFTensorInfo, found int)

Index returns a map value to the GGUFTensorInfo with the given names, and the number of names found.

func (GGUFTensorInfo) Search ¶

func (ti GGUFTensorInfo) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)

Search returns a list of GGUFTensorInfo with the names that match the given regex.

type GGUFTensorInfos ¶

type GGUFTensorInfos []GGUFTensorInfo

GGUFTensorInfos is a list of GGUFTensorInfo.

func (GGUFTensorInfos) Bytes ¶

func (tis GGUFTensorInfos) Bytes() uint64

Bytes returns the number of bytes of the GGUFTensorInfos.

func (GGUFTensorInfos) Count ¶

func (tis GGUFTensorInfos) Count() uint64

Count returns the number of GGUF tensors of the GGUFTensorInfos.

func (GGUFTensorInfos) Elements ¶

func (tis GGUFTensorInfos) Elements() uint64

Elements returns the number of elements of the GGUFTensorInfos.

func (GGUFTensorInfos) Get ¶

func (tis GGUFTensorInfos) Get(name string) (info GGUFTensorInfo, found bool)

Get returns the GGUFTensorInfo with the given name, and true if found, and false otherwise.

func (GGUFTensorInfos) Index ¶

func (tis GGUFTensorInfos) Index(names []string) (infos map[string]GGUFTensorInfo, found int)

Index returns a map value to the GGUFTensorInfos with the given names, and the number of names found.

func (GGUFTensorInfos) Search ¶

func (tis GGUFTensorInfos) Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)

Search returns a list of GGUFTensorInfo with the names that match the given regex.

type GGUFTokenizerMetadata ¶

type GGUFTokenizerMetadata struct {

	// Model is the model of the tokenizer.
	Model string `json:"model"`
	// TokensLength is the size of tokens.
	TokensLength uint64 `json:"tokensLength"`
	// MergeLength is the size of merges.
	MergesLength uint64 `json:"mergesLength"`
	// AddedTokensLength is the size of added tokens after training.
	AddedTokensLength uint64 `json:"addedTokenLength"`
	// BOSTokenID is the ID of the beginning of sentence token.
	//
	// Use -1 if the token is not found.
	BOSTokenID int64 `json:"bosTokenID"`
	// EOSTokenID is the ID of the end of sentence token.
	//
	// Use -1 if the token is not found.
	EOSTokenID int64 `json:"eosTokenID"`
	// EOTTokenID is the ID of the end of text token.
	//
	// Use -1 if the token is not found.
	EOTTokenID int64 `json:"eotTokenID"`
	// EOMTokenID is the ID of the end of message token.
	//
	// Use -1 if the token is not found.
	EOMTokenID int64 `json:"eomTokenID"`
	// UnknownTokenID is the ID of the unknown token.
	//
	// Use -1 if the token is not found.
	UnknownTokenID int64 `json:"unknownTokenID"`
	// SeparatorTokenID is the ID of the separator token.
	//
	// Use -1 if the token is not found.
	SeparatorTokenID int64 `json:"separatorTokenID"`
	// PaddingTokenID is the ID of the padding token.
	//
	// Use -1 if the token is not found.
	PaddingTokenID int64 `json:"paddingTokenID"`

	// TokenSize is the size of tokens in bytes.
	TokensSize int64 `json:"tokensSize"`
	// MergesSize is the size of merges in bytes.
	MergesSize int64 `json:"mergesSize"`
}

GGUFTokenizerMetadata represents the tokenizer metadata of a GGUF file.

type GGUFVersion ¶

type GGUFVersion uint32

GGUFVersion is a version of GGUF file format, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#version-history.

const (
	GGUFVersionV1 GGUFVersion = iota + 1
	GGUFVersionV2
	GGUFVersionV3
)

GGUFVersion constants.

func (GGUFVersion) String ¶

func (i GGUFVersion) String() string

type IGGUFTensorInfos ¶

type IGGUFTensorInfos interface {
	// Get returns the GGUFTensorInfo with the given name,
	// and true if found, and false otherwise.
	Get(name string) (info GGUFTensorInfo, found bool)
	// Search returns a list of GGUFTensorInfo with the names that match the given regex.
	Search(nameRegex *regexp.Regexp) (infos []GGUFTensorInfo)
	// Index returns a map value to the GGUFTensorInfo with the given names,
	// and the number of names found.
	Index(names []string) (infos map[string]GGUFTensorInfo, found int)
	// Elements returns the number of elements(parameters).
	Elements() uint64
	// Bytes returns the number of bytes.
	Bytes() uint64
	// Count returns the number of tensors.
	Count() uint64
}

IGGUFTensorInfos is an interface for GGUF tensor infos, which includes basic operations.

type LLaMACppComputationUsage ¶

type LLaMACppComputationUsage struct {
	// Footprint is the memory footprint for computation.
	Footprint GGUFBytesScalar `json:"footprint"`
	// Input is the memory usage for input.
	Input GGUFBytesScalar `json:"input"`
	// Compute is the memory usage for computation.
	Compute GGUFBytesScalar `json:"graph"`
	// Output is the memory usage for output.
	Output GGUFBytesScalar `json:"output"`
}

LLaMACppComputationUsage represents the memory usage of computation in llama.cpp.

func (LLaMACppComputationUsage) Sum ¶

func (u LLaMACppComputationUsage) Sum() GGUFBytesScalar

type LLaMACppKVCacheUsage ¶

type LLaMACppKVCacheUsage struct {
	// Key is the memory usage for caching previous keys.
	Key GGUFBytesScalar `json:"key"`
	// Value is the memory usage for caching previous values.
	Value GGUFBytesScalar `json:"value"`
}

LLaMACppKVCacheUsage represents the memory usage of caching previous KV in llama.cpp.

func (LLaMACppKVCacheUsage) Sum ¶

func (u LLaMACppKVCacheUsage) Sum() GGUFBytesScalar

type LLaMACppMemoryUsage ¶

type LLaMACppMemoryUsage struct {
	// Footprint is the memory footprint for bootstrapping.
	Footprint GGUFBytesScalar `json:"footprint"`
	// Weight is the memory usage of loading weights.
	Weight LLaMACppWeightUsage `json:"weight"`
	// KVCache is the memory usage of caching previous KV.
	KVCache LLaMACppKVCacheUsage `json:"kvCache"`
	// Computation is the memory usage of computation.
	Computation LLaMACppComputationUsage `json:"computation"`
}

LLaMACppMemoryUsage represents the memory usage for expanding the GGUF file in llama.cpp.

type LLaMACppUsageEstimate ¶

type LLaMACppUsageEstimate struct {
	// Architecture describes what architecture this model implements.
	Architecture string `json:"architecture"`
	// FlashAttention is the flag to indicate whether enable the flash attention,
	// true for enable.
	FlashAttention bool `json:"flashAttention"`
	// ContextSize is the size of the context.
	ContextSize uint64 `json:"contextSize"`
	// OffloadLayers is the number of offloaded layers.
	OffloadLayers uint64 `json:"offloadLayers"`
	// FullOffloaded is the flag to indicate whether the layers are fully offloaded,
	// false for partial offloaded or zero offloaded.
	FullOffloaded bool `json:"fullOffloaded"`
	// NoMMap is the flag to indicate whether the file must be loaded without mmap,
	// true for total loaded.
	NoMMap bool `json:"noMMap"`
	// EmbeddingOnly is the flag to indicate whether the model is used for embedding only,
	// true for embedding only.
	EmbeddingOnly bool `json:"embeddingOnly"`
	// Load is the memory usage for running the GGUF file in RAM.
	Load LLaMACppMemoryUsage `json:"load"`
	// Offload is the memory usage for loading the GGUF file in VRAM.
	Offload LLaMACppMemoryUsage `json:"offload"`
	// MultimodalProjector is the memory usage of multimodal projector.
	MultimodalProjector *LLaMACppUsageEstimate `json:"multimodalProjector,omitempty"`
	// Drafter is the memory usage of drafter.
	Drafter *LLaMACppUsageEstimate `json:"drafter,omitempty"`
}

LLaMACppUsageEstimate represents the estimated result of loading the GGUF file in llama.cpp.

func (LLaMACppUsageEstimate) Summarize ¶

func (e LLaMACppUsageEstimate) Summarize(mmap bool, nonUMARamFootprint, nonUMAVramFootprint uint64) (es LLaMACppUsageEstimateSummary)

Summarize returns the summary of the estimated result of loading the GGUF file in llama.cpp, the input options are used to adjust the summary.

func (LLaMACppUsageEstimate) SummarizeMemory ¶

func (e LLaMACppUsageEstimate) SummarizeMemory(mmap bool, nonUMARamFootprint, nonUMAVramFootprint uint64) (ems LLaMACppUsageEstimateMemorySummary)

SummarizeMemory returns the summary of the estimated memory usage of loading the GGUF file in llama.cpp, the input options are used to adjust the summary.

type LLaMACppUsageEstimateMemorySummary ¶

type LLaMACppUsageEstimateMemorySummary struct {
	// OffloadLayers is the number of offloaded layers.
	OffloadLayers uint64 `json:"offloadLayers"`
	// FullOffloaded is the flag to indicate whether the layers are fully offloaded,
	// false for partial offloaded or zero offloaded.
	FullOffloaded bool `json:"fullOffloaded"`
	// UMA represents the usage of Unified Memory Architecture.
	UMA struct {
		// Load is the memory usage for loading the GGUF file in Load.
		RAM GGUFBytesScalar `json:"ram"`
		// VRAM is the memory usage for loading the GGUF file in VRAM.
		VRAM GGUFBytesScalar `json:"vram"`
	} `json:"uma"`
	// NonUMA represents the usage of Non-Unified Memory Architecture.
	NonUMA struct {
		// Load is the memory usage for loading the GGUF file in Load.
		RAM GGUFBytesScalar `json:"ram"`
		// VRAM is the memory usage for loading the GGUF file in VRAM.
		VRAM GGUFBytesScalar `json:"vram"`
	} `json:"nonUMA"`
}

LLaMACppUsageEstimateMemorySummary represents the memory summary of the usage for loading the GGUF file in llama.cpp.

type LLaMACppUsageEstimateOption ¶

type LLaMACppUsageEstimateOption func(*_LLaMACppUsageEstimateOptions)

func WithArchitecture ¶

func WithArchitecture(arch GGUFArchitectureMetadata) LLaMACppUsageEstimateOption

WithArchitecture sets the architecture for the estimate.

Allows reusing the same GGUFArchitectureMetadata for multiple estimates.

func WithCacheKeyType ¶

func WithCacheKeyType(t GGMLType) LLaMACppUsageEstimateOption

WithCacheKeyType sets the cache key type for the estimate.

func WithCacheValueType ¶

func WithCacheValueType(t GGMLType) LLaMACppUsageEstimateOption

WithCacheValueType sets the cache value type for the estimate.

func WithContextSize ¶

func WithContextSize(size int32) LLaMACppUsageEstimateOption

WithContextSize sets the context size for the estimate.

func WithDrafter ¶

func WithDrafter(dft *LLaMACppUsageEstimate) LLaMACppUsageEstimateOption

WithDrafter sets the drafter estimate usage.

func WithFlashAttention ¶

func WithFlashAttention() LLaMACppUsageEstimateOption

WithFlashAttention sets the flash attention flag.

func WithLogicalBatchSize ¶ added in v0.5.5

func WithLogicalBatchSize(size int32) LLaMACppUsageEstimateOption

WithLogicalBatchSize sets the logical batch size for the estimate.

func WithMultimodalProjector ¶

func WithMultimodalProjector(mmp *LLaMACppUsageEstimate) LLaMACppUsageEstimateOption

WithMultimodalProjector sets the multimodal projector estimate usage.

func WithOffloadLayers ¶

func WithOffloadLayers(layers uint64) LLaMACppUsageEstimateOption

WithOffloadLayers sets the number of layers to offload.

func WithParallelSize ¶

func WithParallelSize(size int32) LLaMACppUsageEstimateOption

WithParallelSize sets the (decoding sequences) parallel size for the estimate.

func WithPhysicalBatchSize ¶

func WithPhysicalBatchSize(size int32) LLaMACppUsageEstimateOption

WithPhysicalBatchSize sets the physical batch size for the estimate.

func WithTokenizer ¶

func WithTokenizer(tokenizer GGUFTokenizerMetadata) LLaMACppUsageEstimateOption

WithTokenizer sets the tokenizer for the estimate.

Allows reusing the same GGUFTokenizerMetadata for multiple estimates.

func WithinMaxContextSize ¶

func WithinMaxContextSize() LLaMACppUsageEstimateOption

WithinMaxContextSize limits the context size to the maximum, if the context size is over the maximum.

func WithoutOffloadKVCache ¶

func WithoutOffloadKVCache() LLaMACppUsageEstimateOption

WithoutOffloadKVCache disables offloading the KV cache.

type LLaMACppUsageEstimateSummary ¶

type LLaMACppUsageEstimateSummary struct {
	Memory []LLaMACppUsageEstimateMemorySummary `json:"memory"`

	// Architecture describes what architecture this model implements.
	Architecture string `json:"architecture"`
	// ContextSize is the size of the context.
	ContextSize uint64 `json:"contextSize"`
	// FlashAttention is the flag to indicate whether enable the flash attention,
	// true for enable.
	FlashAttention bool `json:"flashAttention"`
	// NoMMap is the flag to indicate whether the file must be loaded without mmap,
	// true for total loaded.
	NoMMap bool `json:"noMMap"`
	// EmbeddingOnly is the flag to indicate whether the model is used for embedding only,
	// true for embedding only.
	EmbeddingOnly bool `json:"embeddingOnly"`
}

LLaMACppUsageEstimateSummary represents the summary of the usage for loading the GGUF file in llama.cpp.

type LLaMACppWeightUsage ¶

type LLaMACppWeightUsage struct {
	// Input is the memory usage for loading input tensors.
	Input GGUFBytesScalar `json:"input"`
	// Compute is the memory usage for loading compute tensors.
	Compute GGUFBytesScalar `json:"compute"`
	// Output is the memory usage for loading output tensors.
	Output GGUFBytesScalar `json:"output"`
}

LLaMACppWeightUsage represents the memory usage of loading weights in llama.cpp.

func (LLaMACppWeightUsage) Sum ¶

func (u LLaMACppWeightUsage) Sum() GGUFBytesScalar

type OllamaModel ¶

type OllamaModel struct {
	Schema        string             `json:"schema"`
	Registry      string             `json:"registry"`
	Namespace     string             `json:"namespace"`
	Repository    string             `json:"repository"`
	Tag           string             `json:"tag"`
	SchemaVersion uint32             `json:"schemaVersion"`
	MediaType     string             `json:"mediaType"`
	Config        OllamaModelLayer   `json:"config"`
	Layers        []OllamaModelLayer `json:"layers"`

	// Client is the http client used to complete the OllamaModel's network operations.
	//
	// When this field is nil,
	// it will be set to the client used by OllamaModel.Complete.
	//
	// When this field is offered,
	// the network operations will be done with this client.
	Client *http.Client `json:"-"`
}

OllamaModel represents an Ollama model, its manifest(including MediaType, Config and Layers) can be completed further by calling the Complete method.

func ParseOllamaModel ¶

func ParseOllamaModel(model string) *OllamaModel

ParseOllamaModel parses the given Ollama model string, and returns the OllamaModel, or nil if the model is invalid.

func (*OllamaModel) Complete ¶

func (om *OllamaModel) Complete(ctx context.Context, cli *http.Client) error

Complete completes the OllamaModel with the given context and http client.

func (*OllamaModel) GetLayer ¶

func (om *OllamaModel) GetLayer(mediaType string) (OllamaModelLayer, bool)

GetLayer returns the OllamaModelLayer with the given media type, and true if found, and false otherwise.

func (*OllamaModel) License ¶

func (om *OllamaModel) License(ctx context.Context, cli *http.Client) ([]string, error)

License returns the license of the OllamaModel.

func (*OllamaModel) Messages ¶

func (om *OllamaModel) Messages(ctx context.Context, cli *http.Client) ([]json.RawMessage, error)

Messages returns the messages of the OllamaModel.

func (*OllamaModel) Params ¶

func (om *OllamaModel) Params(ctx context.Context, cli *http.Client) (map[string]any, error)

Params returns the parameters of the OllamaModel.

func (*OllamaModel) SearchLayers ¶

func (om *OllamaModel) SearchLayers(mediaTypeRegex *regexp.Regexp) []OllamaModelLayer

SearchLayers returns a list of OllamaModelLayer with the media type that matches the given regex.

func (*OllamaModel) String ¶

func (om *OllamaModel) String() string

func (*OllamaModel) System ¶

func (om *OllamaModel) System(ctx context.Context, cli *http.Client) (string, error)

System returns the system message of the OllamaModel.

func (*OllamaModel) Template ¶

func (om *OllamaModel) Template(ctx context.Context, cli *http.Client) (string, error)

Template returns the template of the OllamaModel.

func (*OllamaModel) WebPageURL ¶

func (om *OllamaModel) WebPageURL() *url.URL

WebPageURL returns the Ollama web page URL of the OllamaModel.

type OllamaModelLayer ¶

type OllamaModelLayer struct {
	MediaType string `json:"mediaType"`
	Size      uint64 `json:"size"`
	Digest    string `json:"digest"`

	// Root points to the root OllamaModel,
	// which is never serialized or deserialized.
	//
	// When called OllamaModel.Complete,
	// this field will be set to the OllamaModel itself.
	// If not, this field will be nil,
	// and must be set manually to the root OllamaModel before calling the method of OllamaModelLayer.
	Root *OllamaModel `json:"-"`
}

OllamaModelLayer represents an Ollama model layer, its digest can be used to download the artifact.

func (*OllamaModelLayer) BlobURL ¶

func (ol *OllamaModelLayer) BlobURL() *url.URL

BlobURL returns the blob URL of the OllamaModelLayer.

func (*OllamaModelLayer) FetchBlob ¶

func (ol *OllamaModelLayer) FetchBlob(ctx context.Context, cli *http.Client) ([]byte, error)

FetchBlob fetches the blob of the OllamaModelLayer with the given context and http client, and returns the response body as bytes.

func (*OllamaModelLayer) FetchBlobFunc ¶

func (ol *OllamaModelLayer) FetchBlobFunc(ctx context.Context, cli *http.Client, process func(*http.Response) error) error

FetchBlobFunc fetches the blob of the OllamaModelLayer with the given context and http client, and processes the response with the given function.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
gguf-parser Module
util
anyx
bytex
funcx
httpx
json
osx
ptr
signalx
stringx

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL