Documentation ¶
Overview ¶
Package enry implements multiple strategies for programming language identification.
Identification is made based on file name and file content using a seriece of strategies to narrow down possible option. Each strategy is available as a separate API call, as well as a main enty point
GetLanguage(filename string, content []byte) (language string)
It is a port of the https://github.com/github/linguist from Ruby. Upstream Linguist YAML files are used to generate datastructures for data package.
Index ¶
- Constants
- Variables
- func GetColor(language string) string
- func GetLanguage(filename string, content []byte) (language string)
- func GetLanguageByAlias(alias string) (lang string, ok bool)
- func GetLanguageByClassifier(content []byte, candidates []string) (language string, safe bool)
- func GetLanguageByContent(filename string, content []byte) (language string, safe bool)
- func GetLanguageByEmacsModeline(content []byte) (language string, safe bool)
- func GetLanguageByExtension(filename string) (language string, safe bool)
- func GetLanguageByFilename(filename string) (language string, safe bool)
- func GetLanguageByModeline(content []byte) (language string, safe bool)
- func GetLanguageByShebang(content []byte) (language string, safe bool)
- func GetLanguageBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (language string, safe bool)
- func GetLanguageByVimModeline(content []byte) (language string, safe bool)
- func GetLanguageExtensions(language string) []string
- func GetLanguages(filename string, content []byte) []string
- func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
- func GetLanguagesByContent(filename string, content []byte, _ []string) []string
- func GetLanguagesByEmacsModeline(_ string, content []byte, _ []string) []string
- func GetLanguagesByExtension(filename string, _ []byte, _ []string) []string
- func GetLanguagesByFilename(filename string, _ []byte, _ []string) []string
- func GetLanguagesByModeline(_ string, content []byte, candidates []string) []string
- func GetLanguagesByShebang(_ string, content []byte, _ []string) (languages []string)
- func GetLanguagesBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (languages []string)
- func GetLanguagesByVimModeline(_ string, content []byte, _ []string) []string
- func GetMIMEType(path string, language string) string
- func IsBinary(data []byte) bool
- func IsConfiguration(path string) bool
- func IsDocumentation(path string) bool
- func IsDotFile(path string) bool
- func IsImage(path string) bool
- func IsVendor(path string) bool
- type Classifier
- type Strategy
- type Type
Constants ¶
const OtherLanguage = ""
OtherLanguage is used as a zero value when a function can not return a specific language.
Variables ¶
var DefaultStrategies = []Strategy{ GetLanguagesByModeline, GetLanguagesByFilename, GetLanguagesByShebang, GetLanguagesByExtension, GetLanguagesByContent, GetLanguagesByClassifier, }
DefaultStrategies is a sequence of strategies used by GetLanguage to detect languages.
Functions ¶
func GetLanguage ¶
GetLanguage applies a sequence of strategies based on the given filename and content to find out the most probably language to return.
func GetLanguageByAlias ¶
GetLanguageByAlias returns either the language related to the given alias and ok set to true or Otherlanguage and ok set to false if the alias is not recognized.
func GetLanguageByClassifier ¶
GetLanguageByClassifier returns the most probably language detected for the given content. It uses DefaultClassifier, if no candidates are provided it returns OtherLanguage.
func GetLanguageByContent ¶
GetLanguageByContent returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByEmacsModeline ¶
GetLanguageByEmacsModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByExtension ¶
GetLanguageByExtension returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByFilename ¶
GetLanguageByFilename returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByModeline ¶
GetLanguageByModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByShebang ¶
GetLanguageByShebang returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageBySpecificClassifier ¶
func GetLanguageBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (language string, safe bool)
GetLanguageBySpecificClassifier returns the most probably language for the given content using classifier to detect language.
func GetLanguageByVimModeline ¶
GetLanguageByVimModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageExtensions ¶
GetLanguageExtensions returns the different extensions being used by the language.
func GetLanguages ¶
GetLanguages applies a sequence of strategies based on the given filename and content to find out the most probably languages to return. At least one of arguments should be set. If content is missing, language detection will be based on the filename. The function won't read the file, given an empty content.
func GetLanguagesByClassifier ¶
func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
GetLanguagesByClassifier uses DefaultClassifier as a Classifier and returns a sorted slice of possible languages ordered by decreasing language's probability. If there are not candidates it returns nil. It complies with the signature to be a Strategy type.
func GetLanguagesByContent ¶
GetLanguagesByContent returns a slice of languages for the given content. It is a Strategy that uses content-based regexp heuristics and a filename extension.
func GetLanguagesByEmacsModeline ¶
GetLanguagesByEmacsModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByExtension ¶
GetLanguagesByExtension returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByFilename ¶
GetLanguagesByFilename returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByModeline ¶
GetLanguagesByModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByShebang ¶
GetLanguagesByShebang returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesBySpecificClassifier ¶
func GetLanguagesBySpecificClassifier(content []byte, candidates []string, classifier Classifier) (languages []string)
GetLanguagesBySpecificClassifier returns a slice of possible languages. It takes in a Classifier to be used.
func GetLanguagesByVimModeline ¶
GetLanguagesByVimModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetMIMEType ¶
GetMIMEType returns a MIME type of a given file based on its languages.
func IsBinary ¶
IsBinary detects if data is a binary value based on: http://git.kernel.org/cgit/git/git.git/tree/xdiff-interface.c?id=HEAD#n198
func IsConfiguration ¶
IsConfiguration tells if filename is in one of the configuration languages.
func IsDocumentation ¶
IsDocumentation returns whether or not path is a documentation path.
Types ¶
type Classifier ¶
type Classifier interface {
Classify(content []byte, candidates map[string]float64) (languages []string)
}
Classifier is the interface in charge to detect the possible languages of the given content based on a set of candidates. Candidates is a map which can be used to assign weights to languages dynamically.
var DefaultClassifier Classifier = &classifier{ languagesLogProbabilities: data.LanguagesLogProbabilities, tokensLogProbabilities: data.TokensLogProbabilities, tokensTotal: data.TokensTotal, }
DefaultClassifier is a Naive Bayes classifier trained on Linguist samples.
type Type ¶
type Type int
Type represent language's type. Either data, programming, markup, prose, or unknown.
func GetLanguageType ¶
GetLanguageType returns the type of the given language.
Directories ¶
Path | Synopsis |
---|---|
benchmarks
|
|
cmd
|
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources.
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources. |
rule
Package rule contains rule-based heuristic implementations.
|
Package rule contains rule-based heuristic implementations. |
internal
|
|
code-generator/generator
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist.
|
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist. |
tokenizer
Package tokenizer implements file tokenization used by the enry content classifier.
|
Package tokenizer implements file tokenization used by the enry content classifier. |