Documentation ¶
Overview ¶
Package tokenizer implements file tokenization used by the enry content classifier. This package is an implementation detail of enry and should not be imported by other packages.
Index ¶
Constants ¶
View Source
const ByteLimit = 100000
ByteLimit defines the maximum prefix of an input text that will be tokenized.
Variables ¶
This section is empty.
Functions ¶
func Tokenize ¶
Tokenize returns lexical tokens from content. The tokens returned match what the Linguist library returns. At most the first ByteLimit bytes of content are tokenized.
BUG: Until https://github.com/src-d/enry/issues/193 is resolved, there are some differences between this function and the Linguist output.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.