bpetokenizer

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 17, 2022 License: BSD-2-Clause Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BPETokenizer

type BPETokenizer struct {
	// contains filtered or unexported fields
}

BPETokenizer is a higher-level tokenizer, which includes byte-level pre-tokenization.

func New

New returns a new BPETokenizer.

func NewFromModelFolder

func NewFromModelFolder(path string) (*BPETokenizer, error)

NewFromModelFolder returns a new BPETokenizer built from a pre-trained Roberta-compatible model, given the path to the folder containing the separate model and configuration files.

func (*BPETokenizer) Detokenize

func (t *BPETokenizer) Detokenize(ids []int) string

Detokenize flatten and merges a list of ids into a single string.

func (*BPETokenizer) Encode

func (t *BPETokenizer) Encode(text string) (*encodings.Encoding, error)

Encode converts a text into an encoded tokens representation useful for Transformer architectures. It tokenizes using byte-level pre-tokenization and BPE tokenization.

func (*BPETokenizer) SetExtraSpecialTokens

func (t *BPETokenizer) SetExtraSpecialTokens(extra map[int]string)

func (*BPETokenizer) Tokenize

func (t *BPETokenizer) Tokenize(text string) ([]tokenizers.StringOffsetsPair, error)

Tokenize performs byte-level pre-tokenization and BPE tokenization.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL