tools

command
v0.0.0-...-b50cce2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2019 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Overview

The identify_license program tries to identify the license type of an unknown license. The file containing the license text is specified on the command line. Multiple license files can be analyzed with a single command. The type of the license is returned along with the confidence level of the match. The confidence level is between 0.0 and 1.0, with 1.0 indicating an exact match and 0.0 indicating a complete mismatch. The results are sorted by confidence level.

$ identifylicense LICENSE1 LICENSE2
LICENSE2: MIT (confidence: 0.987)
LICENSE1: BSD-2-Clause (confidence: 0.833)

The license_serializer program normalizes and serializes the known licenseclassifier licenses into a compressed archive. The hash values for the licenses are calculated and added to the archive. These can then be used to determine where in unknown text is a good offset to run through the Levenshtein Distance algorithm.

The license_word_count program counts the frequency of words as they appear in the known licenses. This information is useful if we want to be more selective about which files we run through the license classifier.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL