Documentation
¶
Overview ¶
Package hercules contains the functions which are needed to gather various statistics from a Git repository.
The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items are:
- BurndownAnalysis - line burndown statistics for project, files and developers. - Couples - coupling statistics for files and developers.
The typical API usage is to initialize the Pipeline class:
import "gopkg.in/src-d/go-git.v4" var repository *git.Repository // ...initialize repository... pipeline := hercules.NewPipeline(repository)
Then add the required analysis tree nodes:
pipeline.AddItem(&hercules.BlobCache{}) pipeline.AddItem(&hercules.DaysSinceStart{}) pipeline.AddItem(&hercules.TreeDiff{}) pipeline.AddItem(&hercules.FileDiff{}) pipeline.AddItem(&hercules.RenameAnalysis{SimilarityThreshold: 80}) pipeline.AddItem(&hercules.IdentityDetector{})
Then initialize BurndownAnalysis:
burndowner := &hercules.BurndownAnalysis{ Granularity: 30, Sampling: 30, } pipeline.AddItem(burndowner)
Then execute the analysis tree:
pipeline.Initialize() result, err := pipeline.Run(commits)
Finally extract the result:
burndownResults := result[burndowner].(hercules.BurndownResult)
The actual usage example is cmd/hercules/main.go - the command line tool's code.
Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff.
Besides, hercules defines File and RBTree. These are low level data structures required by BurndownAnalysis. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree.
Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector.
Index ¶
- Constants
- func LoadCommitsFromFile(path string, repository *git.Repository) ([]*object.Commit, error)
- func ParseMailmap(contents string) map[string]object.Signature
- type BlobCache
- func (self *BlobCache) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (cache *BlobCache) Finalize() interface{}
- func (cache *BlobCache) Initialize(repository *git.Repository)
- func (cache *BlobCache) Name() string
- func (cache *BlobCache) Provides() []string
- func (cache *BlobCache) Requires() []string
- type BurndownAnalysis
- func (analyser *BurndownAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (analyser *BurndownAnalysis) Finalize() interface{}
- func (analyser *BurndownAnalysis) Initialize(repository *git.Repository)
- func (analyser *BurndownAnalysis) Name() string
- func (analyser *BurndownAnalysis) Provides() []string
- func (analyser *BurndownAnalysis) Requires() []string
- type BurndownResult
- type Couples
- func (couples *Couples) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (couples *Couples) Finalize() interface{}
- func (couples *Couples) Initialize(repository *git.Repository)
- func (couples *Couples) Name() string
- func (couples *Couples) Provides() []string
- func (couples *Couples) Requires() []string
- type CouplesResult
- type DaysSinceStart
- func (days *DaysSinceStart) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (days *DaysSinceStart) Finalize() interface{}
- func (days *DaysSinceStart) Initialize(repository *git.Repository)
- func (days *DaysSinceStart) Name() string
- func (days *DaysSinceStart) Provides() []string
- func (days *DaysSinceStart) Requires() []string
- type File
- type FileDiff
- func (diff *FileDiff) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (diff *FileDiff) Finalize() interface{}
- func (diff *FileDiff) Initialize(repository *git.Repository)
- func (diff *FileDiff) Name() string
- func (diff *FileDiff) Provides() []string
- func (diff *FileDiff) Requires() []string
- type FileDiffData
- type FileGetter
- type IdentityDetector
- func (self *IdentityDetector) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (id *IdentityDetector) Finalize() interface{}
- func (id *IdentityDetector) GeneratePeopleDict(commits []*object.Commit)
- func (id *IdentityDetector) Initialize(repository *git.Repository)
- func (id *IdentityDetector) LoadPeopleDict(path string) error
- func (id *IdentityDetector) Name() string
- func (id *IdentityDetector) Provides() []string
- func (id *IdentityDetector) Requires() []string
- type Pipeline
- type PipelineItem
- type RenameAnalysis
- func (ra *RenameAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (ra *RenameAnalysis) Finalize() interface{}
- func (ra *RenameAnalysis) Initialize(repository *git.Repository)
- func (ra *RenameAnalysis) Name() string
- func (ra *RenameAnalysis) Provides() []string
- func (ra *RenameAnalysis) Requires() []string
- type Status
- type TreeDiff
- func (treediff *TreeDiff) Consume(deps map[string]interface{}) (map[string]interface{}, error)
- func (treediff *TreeDiff) Finalize() interface{}
- func (treediff *TreeDiff) Initialize(repository *git.Repository)
- func (treediff *TreeDiff) Name() string
- func (treediff *TreeDiff) Provides() []string
- func (treediff *TreeDiff) Requires() []string
Constants ¶
const MISSING_AUTHOR = (1 << 18) - 1
const SELF_AUTHOR = (1 << 18) - 2
const TreeEnd int = -1
TreeEnd denotes the value of the last leaf in the tree.
Variables ¶
This section is empty.
Functions ¶
func LoadCommitsFromFile ¶
func ParseMailmap ¶
ParseMailmap parses the contents of .mailmap and returns the mapping between signature parts. It does *not* follow the full signature matching convention, that is, developers are identified by email and by name independently.
Types ¶
type BlobCache ¶
type BlobCache struct { IgnoreMissingSubmodules bool // contains filtered or unexported fields }
func (*BlobCache) Initialize ¶
func (cache *BlobCache) Initialize(repository *git.Repository)
type BurndownAnalysis ¶
type BurndownAnalysis struct { // Granularity sets the size of each band - the number of days it spans. // Smaller values provide better resolution but require more work and eat more // memory. 30 days is usually enough. Granularity int // Sampling sets how detailed is the statistic - the size of the interval in // days between consecutive measurements. It is usually a good idea to set it // <= Granularity. Try 15 or 30. Sampling int // TrackFiles enables or disables the fine-grained per-file burndown analysis. // It does not change the top level burndown results. TrackFiles bool // The number of developers for which to collect the burndown stats. 0 disables it. PeopleNumber int // Debug activates the debugging mode. Analyse() runs slower in this mode // but it accurately checks all the intermediate states for invariant // violations. Debug bool // contains filtered or unexported fields }
BurndownAnalyser allows to gather the line burndown statistics for a Git repository.
func (*BurndownAnalysis) Consume ¶
func (analyser *BurndownAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)
func (*BurndownAnalysis) Finalize ¶
func (analyser *BurndownAnalysis) Finalize() interface{}
Finalize() returns the list of snapshots of the cumulative line edit times and the similar lists for every file which is alive in HEAD. The number of snapshots (the first dimension >[]<[]int64) depends on Analyser.Sampling (the more Sampling, the less the value); the length of each snapshot depends on Analyser.Granularity (the more Granularity, the less the value).
func (*BurndownAnalysis) Initialize ¶
func (analyser *BurndownAnalysis) Initialize(repository *git.Repository)
func (*BurndownAnalysis) Name ¶
func (analyser *BurndownAnalysis) Name() string
func (*BurndownAnalysis) Provides ¶
func (analyser *BurndownAnalysis) Provides() []string
func (*BurndownAnalysis) Requires ¶
func (analyser *BurndownAnalysis) Requires() []string
type BurndownResult ¶
type Couples ¶
type Couples struct { // The number of developers for which to build the matrix. 0 disables this analysis. PeopleNumber int // contains filtered or unexported fields }
func (*Couples) Initialize ¶
func (couples *Couples) Initialize(repository *git.Repository)
type CouplesResult ¶
type DaysSinceStart ¶
type DaysSinceStart struct {
// contains filtered or unexported fields
}
func (*DaysSinceStart) Consume ¶
func (days *DaysSinceStart) Consume(deps map[string]interface{}) (map[string]interface{}, error)
func (*DaysSinceStart) Finalize ¶
func (days *DaysSinceStart) Finalize() interface{}
func (*DaysSinceStart) Initialize ¶
func (days *DaysSinceStart) Initialize(repository *git.Repository)
func (*DaysSinceStart) Name ¶
func (days *DaysSinceStart) Name() string
func (*DaysSinceStart) Provides ¶
func (days *DaysSinceStart) Provides() []string
func (*DaysSinceStart) Requires ¶
func (days *DaysSinceStart) Requires() []string
type File ¶
type File struct {
// contains filtered or unexported fields
}
A file encapsulates a balanced binary tree to store line intervals and a cumulative mapping of values to the corresponding length counters. Users are not supposed to create File-s directly; instead, they should call NewFile(). NewFileFromTree() is the special constructor which is useful in the tests.
Len() returns the number of lines in File.
Update() mutates File by introducing tree structural changes and updating the length mapping.
Dump() writes the tree to a string and Validate() checks the tree integrity.
func NewFile ¶
NewFile initializes a new instance of File struct.
time is the starting value of the first node;
length is the starting length of the tree (the key of the second and the last node);
statuses are the attached interval length mappings.
func NewFileFromTree ¶
NewFileFromTree is an alternative constructor for File which is used in tests. The resulting tree is validated with Validate() to ensure the initial integrity.
keys is a slice with the starting tree keys.
vals is a slice with the starting tree values. Must match the size of keys.
statuses are the attached interval length mappings.
func (*File) Dump ¶
Dump formats the underlying line interval tree into a string. Useful for error messages, panic()-s and debugging.
func (*File) Len ¶
Len returns the File's size - that is, the maximum key in the tree of line intervals.
func (*File) Update ¶
Update modifies the underlying tree to adapt to the specified line changes.
time is the time when the requested changes are made. Sets the values of the inserted nodes.
pos is the index of the line at which the changes are introduced.
ins_length is the number of inserted lines after pos.
del_length is the number of removed lines after pos. Deletions come before the insertions.
The code inside this function is probably the most important one throughout the project. It is extensively covered with tests. If you find a bug, please add the corresponding case in file_test.go.
func (*File) Validate ¶
func (file *File) Validate()
Validate checks the underlying line interval tree integrity. The checks are as follows:
1. The minimum key must be 0 because the first line index is always 0.
2. The last node must carry TreeEnd value. This is the maintained invariant which marks the ending of the last line interval.
3. Node keys must monotonically increase and never duplicate.
type FileDiff ¶
type FileDiff struct { }
FileDiff calculates the difference of files which were modified.
func (*FileDiff) Initialize ¶
func (diff *FileDiff) Initialize(repository *git.Repository)
type FileDiffData ¶
type FileDiffData struct { OldLinesOfCode int NewLinesOfCode int Diffs []diffmatchpatch.Diff }
type IdentityDetector ¶
type IdentityDetector struct { // Maps email || name -> developer id. PeopleDict map[string]int // Maps developer id -> description ReversePeopleDict []string }
func (*IdentityDetector) Consume ¶
func (self *IdentityDetector) Consume(deps map[string]interface{}) (map[string]interface{}, error)
func (*IdentityDetector) Finalize ¶
func (id *IdentityDetector) Finalize() interface{}
func (*IdentityDetector) GeneratePeopleDict ¶
func (id *IdentityDetector) GeneratePeopleDict(commits []*object.Commit)
func (*IdentityDetector) Initialize ¶
func (id *IdentityDetector) Initialize(repository *git.Repository)
func (*IdentityDetector) LoadPeopleDict ¶
func (id *IdentityDetector) LoadPeopleDict(path string) error
func (*IdentityDetector) Name ¶
func (id *IdentityDetector) Name() string
func (*IdentityDetector) Provides ¶
func (id *IdentityDetector) Provides() []string
func (*IdentityDetector) Requires ¶
func (id *IdentityDetector) Requires() []string
type Pipeline ¶
type Pipeline struct { // OnProgress is the callback which is invoked in Analyse() to output it's // progress. The first argument is the number of processed commits and the // second is the total number of commits. OnProgress func(int, int) // contains filtered or unexported fields }
func NewPipeline ¶
func NewPipeline(repository *git.Repository) *Pipeline
func (*Pipeline) AddItem ¶
func (pipeline *Pipeline) AddItem(item PipelineItem)
func (*Pipeline) Commits ¶
Commits returns the critical path in the repository's history. It starts from HEAD and traces commits backwards till the root. When it encounters a merge (more than one parent), it always chooses the first parent.
func (*Pipeline) Initialize ¶
func (pipeline *Pipeline) Initialize()
func (*Pipeline) RemoveItem ¶
func (pipeline *Pipeline) RemoveItem(item PipelineItem)
type PipelineItem ¶
type PipelineItem interface { // Name returns the name of the analysis. Name() string // Provides returns the list of keys of reusable calculated entities. // Other items may depend on them. Provides() []string // Requires returns the list of keys of needed entities which must be supplied in Consume(). Requires() []string // Initialize prepares and resets the item. Consume() requires Initialize() // to be called at least once beforehand. Initialize(*git.Repository) // Consume processes the next commit. // deps contains the required entities which match Depends(). Besides, it always includes // "commit" and "index". // Returns the calculated entities which match Provides(). Consume(deps map[string]interface{}) (map[string]interface{}, error) // Finalize returns the result of the analysis. Finalize() interface{} }
type RenameAnalysis ¶
type RenameAnalysis struct { // SimilarityThreshold adjusts the heuristic to determine file renames. // It has the same units as cgit's -X rename-threshold or -M. Better to // set it to the default value of 90 (90%). SimilarityThreshold int // contains filtered or unexported fields }
func (*RenameAnalysis) Consume ¶
func (ra *RenameAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)
func (*RenameAnalysis) Finalize ¶
func (ra *RenameAnalysis) Finalize() interface{}
func (*RenameAnalysis) Initialize ¶
func (ra *RenameAnalysis) Initialize(repository *git.Repository)
func (*RenameAnalysis) Name ¶
func (ra *RenameAnalysis) Name() string
func (*RenameAnalysis) Provides ¶
func (ra *RenameAnalysis) Provides() []string
func (*RenameAnalysis) Requires ¶
func (ra *RenameAnalysis) Requires() []string
type Status ¶
type Status struct {
// contains filtered or unexported fields
}
A status is the something we would like to update during File.Update().
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|
hercules
Package main provides the command line tool to gather the line burndown statistics from Git repositories.
|
Package main provides the command line tool to gather the line burndown statistics from Git repositories. |
Package pb is a generated protocol buffer package.
|
Package pb is a generated protocol buffer package. |