Package tokenizer implements file tokenization used by the enry content
classifier. This package is an implementation detail of enry and should not
be imported by other packages.
Tokenize returns language-agnostic lexical tokens from content. The tokens
returned should match what the Linguist library returns. At most the first
100KB of content are tokenized.