Documentation ¶
Overview ¶
Package enry implements multiple strategies for programming language identification.
Identification is made based on file name and file content using a service of strategies to narrow down possible option. Each strategy is available as a separate API call, as well as a main enty point
GetLanguage(filename string, content []byte) (language string)
It is a port of the https://github.com/github/linguist from Ruby. Upstream Linguist YAML files are used to generate datastructures for data package.
Index ¶
- Constants
- Variables
- func GetColor(language string) string
- func GetLanguage(filename string, content []byte) (language string)
- func GetLanguageByAlias(alias string) (lang string, ok bool)
- func GetLanguageByClassifier(content []byte, candidates []string) (language string, safe bool)
- func GetLanguageByContent(filename string, content []byte) (language string, safe bool)
- func GetLanguageByEmacsModeline(content []byte) (language string, safe bool)
- func GetLanguageByExtension(filename string) (language string, safe bool)
- func GetLanguageByFilename(filename string) (language string, safe bool)
- func GetLanguageByModeline(content []byte) (language string, safe bool)
- func GetLanguageByShebang(content []byte) (language string, safe bool)
- func GetLanguageByVimModeline(content []byte) (language string, safe bool)
- func GetLanguageExtensions(language string) []string
- func GetLanguageGroup(language string) string
- func GetLanguageID(language string) (int, bool)
- func GetLanguageInfo(language string) (data.LanguageInfo, error)
- func GetLanguageInfoByID(id int) (data.LanguageInfo, error)
- func GetLanguages(filename string, content []byte) []string
- func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
- func GetLanguagesByContent(filename string, content []byte, _ []string) []string
- func GetLanguagesByEmacsModeline(_ string, content []byte, _ []string) []string
- func GetLanguagesByExtension(filename string, _ []byte, _ []string) []string
- func GetLanguagesByFilename(filename string, _ []byte, _ []string) []string
- func GetLanguagesByManpage(filename string, _ []byte, _ []string) []string
- func GetLanguagesByModeline(_ string, content []byte, candidates []string) []string
- func GetLanguagesByShebang(_ string, content []byte, _ []string) (languages []string)
- func GetLanguagesByVimModeline(_ string, content []byte, _ []string) []string
- func GetLanguagesByXML(_ string, content []byte, candidates []string) []string
- func GetMIMEType(path string, language string) string
- func IsBinary(data []byte) bool
- func IsConfiguration(path string) bool
- func IsDocumentation(path string) bool
- func IsDotFile(path string) bool
- func IsGenerated(path string, content []byte) bool
- func IsImage(path string) bool
- func IsTest(path string) bool
- func IsVendor(path string) bool
- type Strategy
- type Type
Constants ¶
const ( Unknown Type = Type(data.TypeUnknown) Data = Type(data.TypeData) Programming = Type(data.TypeProgramming) Markup = Type(data.TypeMarkup) Prose = Type(data.TypeProse) )
Type's values.
const OtherLanguage = ""
OtherLanguage is used as a zero value when a function can not return a specific language.
Variables ¶
var DefaultStrategies = []Strategy{ GetLanguagesByModeline, GetLanguagesByFilename, GetLanguagesByShebang, GetLanguagesByExtension, GetLanguagesByXML, GetLanguagesByManpage, GetLanguagesByContent, GetLanguagesByClassifier, }
DefaultStrategies is a sequence of strategies used by GetLanguage to detect languages.
Functions ¶
func GetLanguage ¶
GetLanguage applies a sequence of strategies based on the given filename and content to find out the most probable language to return.
func GetLanguageByAlias ¶
GetLanguageByAlias returns either the language related to the given alias and ok set to true or Otherlanguage and ok set to false if the alias is not recognized.
func GetLanguageByClassifier ¶
GetLanguageByClassifier returns the most probably language detected for the given content. It uses defaultClassifier, if no candidates are provided it returns OtherLanguage.
func GetLanguageByContent ¶
GetLanguageByContent returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByEmacsModeline ¶
GetLanguageByEmacsModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByExtension ¶
GetLanguageByExtension returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByFilename ¶
GetLanguageByFilename returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByModeline ¶
GetLanguageByModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByShebang ¶
GetLanguageByShebang returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageByVimModeline ¶
GetLanguageByVimModeline returns detected language. If there are more than one possibles languages it returns the first language by alphabetically order and safe to false.
func GetLanguageExtensions ¶
GetLanguageExtensions returns all extensions associated with the given language.
func GetLanguageGroup ¶
GetLanguageGroup returns language group or empty string if language does not have group.
func GetLanguageID ¶
GetLanguageID returns the ID for the language. IDs are assigned by GitHub. The input must be the canonical language name. Aliases are not supported.
NOTE: The zero value (0) is a valid language ID, so this API mimics the Go map API. Use the second return value to check if the language was found.
func GetLanguageInfo ¶
func GetLanguageInfo(language string) (data.LanguageInfo, error)
GetLanguageInfo returns the LanguageInfo for a given language name, or an error if not found.
func GetLanguageInfoByID ¶
func GetLanguageInfoByID(id int) (data.LanguageInfo, error)
GetLanguageInfoByID returns the LanguageInfo for a given language ID, or an error if not found.
func GetLanguages ¶
GetLanguages applies a sequence of strategies based on the given filename and content to find out the most probable languages to return.
If it finds a strategy that produces a single result, it will be returned; otherise the last strategy that returned multiple results will be returned. If the content is binary, no results will be returned. This matches the behavior of Linguist.detect: https://github.com/github/linguist/blob/aad49acc0624c70d654a8dce447887dbbc713c7a/lib/linguist.rb#L14-L49
At least one of arguments should be set. If content is missing, language detection will be based on the filename. The function won't read the file, given an empty content.
func GetLanguagesByClassifier ¶
func GetLanguagesByClassifier(filename string, content []byte, candidates []string) (languages []string)
GetLanguagesByClassifier returns a sorted slice of possible languages ordered by decreasing language's probability. If there are not candidates it returns nil. It is a Strategy that uses a pre-trained defaultClassifier.
func GetLanguagesByContent ¶
GetLanguagesByContent returns a slice of languages for the given content. It is a Strategy that uses content-based regexp heuristics and a filename extension.
func GetLanguagesByEmacsModeline ¶
GetLanguagesByEmacsModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByExtension ¶
GetLanguagesByExtension returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByFilename ¶
GetLanguagesByFilename returns a slice of possible languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByManpage ¶
GetLanguagesByManpage returns a slice of possible manpage languages for the given filename. It complies with the signature to be a Strategy type.
func GetLanguagesByModeline ¶
GetLanguagesByModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByShebang ¶
GetLanguagesByShebang returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByVimModeline ¶
GetLanguagesByVimModeline returns a slice of possible languages for the given content. It complies with the signature to be a Strategy type.
func GetLanguagesByXML ¶
GetLanguagesByXML returns a slice of possible XML language for the given filename. It complies with the signature to be a Strategy type.
func GetMIMEType ¶
GetMIMEType returns a MIME type of a given file based on its languages.
func IsBinary ¶
IsBinary detects if data is a binary value based on: http://git.kernel.org/cgit/git/git.git/tree/xdiff-interface.c?id=HEAD#n198
func IsConfiguration ¶
IsConfiguration tells if filename is in one of the configuration languages.
func IsDocumentation ¶
IsDocumentation returns whether or not path is a documentation path.
func IsGenerated ¶
IsGenerated returns whether the file with the given path and content is a generated file.
Types ¶
Directories ¶
Path | Synopsis |
---|---|
benchmarks
|
|
cmd
|
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources.
|
Package data contains only auto-generated data-structures for all the language identification strategies from the Linguist project sources. |
rule
Package rule contains rule-based heuristic implementations.
|
Package rule contains rule-based heuristic implementations. |
internal
|
|
code-generator/generator
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist.
|
Package generator provides facilities to generate Go code for the package data in enry from YAML files describing supported languages in Linguist. |
tokenizer
Package tokenizer implements file tokenization used by the enry content classifier.
|
Package tokenizer implements file tokenization used by the enry content classifier. |