Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BPETokenizer ¶
type BPETokenizer struct {
// contains filtered or unexported fields
}
BPETokenizer is a higher-level tokenizer, which includes byte-level pre-tokenization.
func New ¶
func New( preTokenizer *bytelevelpretokenizer.ByteLevelPreTokenizer, model *bpemodel.BPEModel, vocab *vocabulary.Vocabulary, ) *BPETokenizer
New returns a new BPETokenizer.
func NewFromModelFolder ¶
func NewFromModelFolder(path string) (*BPETokenizer, error)
NewFromModelFolder returns a new BPETokenizer built from a pre-trained Roberta-compatible model, given the path to the folder containing the separate model and configuration files.
func (*BPETokenizer) Detokenize ¶
func (t *BPETokenizer) Detokenize(ids []int) string
Detokenize flatten and merges a list of ids into a single string.
func (*BPETokenizer) Encode ¶
func (t *BPETokenizer) Encode(text string) (*encodings.Encoding, error)
Encode converts a text into an encoded tokens representation useful for Transformer architectures. It tokenizes using byte-level pre-tokenization and BPE tokenization.
func (*BPETokenizer) SetExtraSpecialTokens ¶
func (t *BPETokenizer) SetExtraSpecialTokens(extra map[int]string)
func (*BPETokenizer) Tokenize ¶
func (t *BPETokenizer) Tokenize(text string) ([]tokenizers.StringOffsetsPair, error)
Tokenize performs byte-level pre-tokenization and BPE tokenization.
Click to show internal directories.
Click to hide internal directories.