tokenizer

package

v2.1.0 Latest Latest Go to latest Published: Aug 7, 2019 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/src-d/enry

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tokenizer implements file tokenization used by the enry content classifier. This package is an implementation detail of enry and should not be imported by other packages.

Constants ¶

View Source

const ByteLimit = 100000

ByteLimit defines the maximum prefix of an input text that will be tokenized.

Variables ¶

This section is empty.

Functions ¶

func Tokenize ¶

func Tokenize(content []byte) []string

Tokenize returns lexical tokens from content. The tokens returned match what the Linguist library returns. At most the first ByteLimit bytes of content are tokenized.

BUG: Until https://github.com/src-d/enry/issues/193 is resolved, there are some differences between this function and the Linguist output.

Types ¶

This section is empty.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
flex

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL