Documentation ¶
Overview ¶
Package cl100kbase registers the "cl100k_base" tokenizer with gotoken. To use this tokenizer:
import ( "github.com/peterheb/gotoken" _ "github.com/peterheb/gotoken/cl100kbase" ) ... tok, err := gotoken.GetTokenizer("cl100k_base")
This file was generated from the following data:
- Source URL: https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
- Source SHA-256: 223921b76ee99bde995b7ff738513eef100fb51d18c93597a113bcffe865b2a7
- Generated: 2023-04-16T22:17:09Z
Index ¶
Constants ¶
View Source
const ( EndOfText = "<|endoftext|>" FIMPrefix = "<|fim_prefix|>" FIMMiddle = "<|fim_middle|>" FIMSuffix = "<|fim_suffix|>" IMStart = "<|im_start|>" // these are documented in the tiktoken README IMEnd = "<|im_end|>" // but aren't in the Python code EndOfPrompt = "<|endofprompt|>" )
These special tokens are defined by this encoding.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.