Documentation ¶
Overview ¶
Package classifier provides the implementation of the v2 license classifier.
Index ¶
- func LicenseName(in string) string
- type Classifier
- func (c *Classifier) AddContent(category, name, variant string, content []byte)
- func (c *Classifier) LoadLicenses(dir string) error
- func (c *Classifier) Match(in []byte) Results
- func (c *Classifier) MatchFrom(in io.Reader) (Results, error)
- func (c *Classifier) Normalize(in []byte) []byte
- func (c *Classifier) SetTraceConfiguration(in *TraceConfiguration)
- type Match
- type Matches
- type Results
- type TraceConfiguration
- type TraceFunc
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LicenseName ¶
LicenseName produces the output name for a license, removing the internal structure of the filename in use.
Types ¶
type Classifier ¶
type Classifier struct {
// contains filtered or unexported fields
}
Classifier provides methods for identifying open source licenses in text content.
func NewClassifier ¶
func NewClassifier(threshold float64) *Classifier
NewClassifier creates a classifier with an empty corpus.
func (*Classifier) AddContent ¶
func (c *Classifier) AddContent(category, name, variant string, content []byte)
AddContent incorporates the provided textual content into the classifier for matching. This will not modify the supplied content.
func (*Classifier) LoadLicenses ¶
func (c *Classifier) LoadLicenses(dir string) error
LoadLicenses adds the contents of the supplied directory to the corpus of the classifier.
func (*Classifier) Match ¶
func (c *Classifier) Match(in []byte) Results
Match finds matches within an unknown text. This will not modify the contents of the supplied byte slice.
func (*Classifier) MatchFrom ¶
func (c *Classifier) MatchFrom(in io.Reader) (Results, error)
MatchFrom finds matches within the read content.
func (*Classifier) Normalize ¶
func (c *Classifier) Normalize(in []byte) []byte
Normalize takes input content and applies the following transforms to aid in identifying license content. The return value of this function is line-separated text which is the basis for position values returned by the classifier.
1. Breaks up long lines of text. This helps with detecting licenses like in TODO(wcn):URL reference
2. Certain ignorable texts are removed to aid matching blocks of text. Introductory lines such as "The MIT License" are removed. Copyright notices are removed since the parties are variable and shouldn't impact matching.
It is NOT necessary to call this function to simply identify licenses in a file. It should only be called to aid presenting this information to the user in context (for example, creating diffs of differences to canonical licenses).
It is an invariant of the classifier that calling Match(Normalize(in)) will return the same results as Match(in).
func (*Classifier) SetTraceConfiguration ¶
func (c *Classifier) SetTraceConfiguration(in *TraceConfiguration)
SetTraceConfiguration installs a tracing configuration for the classifier.
type Match ¶
type Match struct { Name string Confidence float64 MatchType string Variant string StartLine int EndLine int StartTokenIndex int EndTokenIndex int }
Match is the information about a single instance of a detected match.
type TraceConfiguration ¶
type TraceConfiguration struct { // Comma-separated list of phases to be traced. Can use * for all phases. TracePhases string // Comma-separated list of licenses to be traced. Can use * as a suffix to // match prefixes, or by itself to match all licenses. TraceLicenses string // Tracer specifies a TraceFunc used to capture tracing information. // If not supplied, emits using fmt.Printf Tracer TraceFunc // contains filtered or unexported fields }
TraceConfiguration specifies the configuration for tracing execution of the license classifier.