Documentation ¶
Overview ¶
Package enry implements multiple strategies for programming language identification.
Identification is made based on file name and file content using a seriece of strategies to narrow down possible option. Each strategy is available as a separate API call, as well as a main enty point
GetLanguage(filename string, content []byte) (language string)
It is a port of the https://github.com/github/linguist from Ruby. Upstream Linguist YAML files are used to generate datastructures for data package.
Index ¶
- Constants
- Variables
- func GetLanguage(filename string, content []byte) (language string)
- func GetLanguageByAlias(alias string) (lang string, ok bool)
- func GetLanguageByClassifier(content []byte, candidates []string) (language string, safe bool)
- func GetLanguageByContent(filename string, content []byte) (language string, safe bool)
- func GetLanguageByEmacsModeline(content []byte) (language string, safe bool)
- func GetLanguageByExtension(filename string) (language string, safe bool)
- func GetLanguageByFilename(filename string) (language string, safe bool)
- func GetLanguageByModeline(content []byte) (language string, safe bool)
- func GetLanguageByShebang(content []byte) (language string, safe bool)
- func GetLanguageBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (language string, safe bool)
- func GetLanguageByVimModeline(content []byte) (language string, safe bool)
- func GetLanguageExtensions(language string) []string
- func GetLanguages(filename string, content []byte) []string
- func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
- func GetLanguagesByContent(filename string, content []byte, _ []string) []string
- func GetLanguagesByEmacsModeline(_ string, content []byte, _ []string) []string
- func GetLanguagesByExtension(filename string, _ []byte, _ []string) []string
- func GetLanguagesByFilename(filename string, _ []byte, _ []string) []string
- func GetLanguagesByModeline(_ string, content []byte, candidates []string) []string
- func GetLanguagesByShebang(_ string, content []byte, _ []string) (languages []string)
- func GetLanguagesBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (languages []string)
- func GetLanguagesByVimModeline(_ string, content []byte, _ []string) []string
- func GetMIMEType(path string, language string) string
- func IsBinary(data []byte) bool
- func IsConfiguration(path string) bool
- func IsDocumentation(path string) bool
- func IsDotFile(path string) bool
- func IsImage(path string) bool
- func IsVendor(path string) bool
- type Classifier
- type Strategy
- type Type
Constants ¶
const OtherLanguage = ""
OtherLanguage is used as a zero value when a function can not return a specific language.
Variables ¶
var DefaultStrategies = []Strategy{ GetLanguagesByModeline, GetLanguagesByFilename, GetLanguagesByShebang, GetLanguagesByExtension, GetLanguagesByContent, GetLanguagesByClassifier, }
DefaultStrategies is a sequence of strategies used by GetLanguage to detect languages.
Functions ¶
func GetLanguage ¶
GetLanguage applies a sequence of strategies based on the given filename and content to find out the most probably language to return.
func GetLanguageByAlias ¶
GetLanguageByAlias returns either the language related to the given alias and ok set to true or Otherlanguage and ok set to false if the alias is not recognized.
func GetLanguageByClassifier ¶ added in v1.2.1
GetLanguageByClassifier returns the most probably language detected for the given content. It uses DefaultClassifier, if no candidates are provided it returns OtherLanguage.
func GetLanguageByContent ¶
GetLanguageByContent returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByEmacsModeline ¶
GetLanguageByEmacsModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByExtension ¶
GetLanguageByExtension returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByFilename ¶
GetLanguageByFilename returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByModeline ¶
GetLanguageByModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByShebang ¶
GetLanguageByShebang returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageBySpecificClassifier ¶ added in v1.2.1
func GetLanguageBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (language string, safe bool)
GetLanguageBySpecificClassifier returns the most probably language for the given content using classifier to detect language.
func GetLanguageByVimModeline ¶
GetLanguageByVimModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageExtensions ¶
GetLanguageExtensions returns the different extensions being used by the language.
func GetLanguages ¶ added in v1.2.1
GetLanguages applies a sequence of strategies based on the given filename and content to find out the most probably languages to return. At least one of arguments should be set. If content is missing, language detection will be based on the filename. The function won't read the file, given an empty content.
func GetLanguagesByClassifier ¶ added in v1.2.1
func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
GetLanguagesByClassifier uses DefaultClassifier as a Classifier and returns a sorted slice of possible languages ordered by decreasing language's probability. If there are not candidates it returns nil. It complies with the signature to be a Strategy type.
func GetLanguagesByContent ¶ added in v1.2.1
GetLanguagesByContent returns a slice of languages for the given content. It is a Strategy that uses content-based regexp heuristics and a filename extension.
func GetLanguagesByEmacsModeline ¶ added in v1.2.1
GetLanguagesByEmacsModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByExtension ¶ added in v1.2.1
GetLanguagesByExtension returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByFilename ¶ added in v1.2.1
GetLanguagesByFilename returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByModeline ¶ added in v1.2.1
GetLanguagesByModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByShebang ¶ added in v1.2.1
GetLanguagesByShebang returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesBySpecificClassifier ¶ added in v1.2.1
func GetLanguagesBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (languages []string)
GetLanguagesBySpecificClassifier returns a slice of possible languages. It takes in a Classifier to be used.
func GetLanguagesByVimModeline ¶ added in v1.2.1
GetLanguagesByVimModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetMIMEType ¶ added in v1.7.0
GetMIMEType returns a MIME type of a given file based on its languages.
func IsBinary ¶
IsBinary detects if data is a binary value based on: http://git.kernel.org/cgit/git/git.git/tree/xdiff-interface.c?id=HEAD#n198
func IsConfiguration ¶
IsConfiguration tells if filename is in one of the configuration languages.
func IsDocumentation ¶
IsDocumentation returns whether or not path is a documentation path.
Types ¶
type Classifier ¶ added in v1.2.1
type Classifier interface {
Classify(content []byte, candidates map[string]float64) (languages []string)
}
Classifier is the interface in charge to detect the possible languages of the given content based on a set of candidates. Candidates is a map which can be used to assign weights to languages dynamically.
var DefaultClassifier Classifier = &classifier{ languagesLogProbabilities: data.LanguagesLogProbabilities, tokensLogProbabilities: data.TokensLogProbabilities, tokensTotal: data.TokensTotal, }
DefaultClassifier is a Naive Bayes classifier trained on Linguist samples.
type Strategy ¶ added in v1.2.1
Strategy type fix the signature for the functions that can be used as a strategy.
type Type ¶
type Type int
Type represent language's type. Either data, programming, markup, prose, or unknown.
func GetLanguageType ¶
GetLanguageType returns the type of the given language.
Directories ¶
Path | Synopsis |
---|---|
benchmarks
|
|
cmd
|
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources.
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources. |
rule
Package rule contains rule-based heuristic implementations.
|
Package rule contains rule-based heuristic implementations. |
internal
|
|
code-generator/generator
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist.
|
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist. |
tokenizer
Package tokenizer implements file tokenization used by the enry content classifier.
|
Package tokenizer implements file tokenization used by the enry content classifier. |