Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Processor ¶
type Processor struct {
// contains filtered or unexported fields
}
Processor represents a SentencePiece processor (tokenizer). A Processor converts input text into a sequence of tokens LLMs use, and back. The mapping between token IDs and the text they represent is read from the model proto (provided to the constructor); it's the same between all calls to the Encode method.
The term "processor" comes from the original C++ SentencePiece library and its Python bindings.
func NewProcessor ¶
NewProcessor creates a new Processor from a reader with the protobuf data.
func NewProcessorFromPath ¶
NewProcessorFromPath creates a new Processor from a file path to the protobuf data.
func (*Processor) Decode ¶
Decode translates a list of IDs produced by [Encode] back into the string it represents.
func (*Processor) DecodeTokens ¶
DecodeTokens is a convenience wrapper around [Decode], accepting a list of tokens as returned by [Encode]. It only uses the ID fields of tokens to decode the text.
func (*Processor) VocabularySize ¶
VocabularySize returns the vocabulary size from the proto model.
Directories
¶
Path | Synopsis |
---|---|
internal
|
|
priorityqueue
Package priorityqueue provides a generic priority queue with Insert and PopMax operations.
|
Package priorityqueue provides a generic priority queue with Insert and PopMax operations. |