git

package module
v3.1.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 4, 2016 License: MIT Imports: 22 Imported by: 0

README

go-git GoDoc Build Status codecov.io codebeat badge

A low level and highly extensible git client library for reading repositories from git servers. It is written in Go from scratch, without any C dependencies.

We have been following the open/close principle in its design to facilitate extensions.

go-git does not claim to be a replacement of git2go as its approach and functionality is quite different.

ok, but why? ...

At source{d} we analyze almost all the public open source contributions made to git repositories in the world.

We want to extract detailed information from each GitHub repository, which requires downloading repository packfiles and analyzing them: extracting their code, authors, dates and the languages and ecosystems they use. We are also interested in knowing who contributes to what, so we can tell top contributors from the more casual ones.

You can obtain all this information using the standard git command running over a local clone of a repository, but this simple solution does not scale well over millions of repositories: we want to avoid having local copies of the unpacked repositories in a regular file system; go-git allows us to work with an in-memory representation of repositories instead.

I see... but this is production ready?

Yes!!!, we have been using go-git at source{d} since August 2015 to analyze all GitHub public repositories (i.e. 16M of repositories).

Coming Soon

Blame support: right now we are using a forward version of a line-tracking algorithm and we are having some problems handling merges. The plan is to get merges right and change to a backward line-tracking algorithm soon.

Installation

The recommended way to install go-git is:

go get -u gopkg.in/src-d/go-git.v3/...

Examples

Retrieving the commits for a given repository:

r, err := git.NewRepository("https://github.com/src-d/go-git", nil)
if err != nil {
	panic(err)
}

if err := r.PullDefault(); err != nil {
	panic(err)
}

iter, err := r.Commits()
if err != nil {
	panic(err)
}
defer iter.Close()

for {
	//the commits are not shorted in any special order
	commit, err := iter.Next()
	if err != nil {
		if err == io.EOF {
			break
		}

		panic(err)
	}

	fmt.Println(commit)
}

Outputs:

commit 2275fa7d0c75d20103f90b0e1616937d5a9fc5e6
Author: Máximo Cuadros <mcuadros@gmail.com>
Date:   2015-10-23 00:44:33 +0200 +0200

commit 35b585759cbf29f8ec428ef89da20705d59f99ec
Author: Carlos Cobo <toqueteos@gmail.com>
Date:   2015-05-20 15:21:37 +0200 +0200

commit 7e3259c191a9de23d88b6077dcb1cd427e925432
Author: Alberto Cortés <alberto@sourced.tech>
Date:   2016-01-21 03:29:57 +0100 +0100

commit 24b8ae50db91f3909b11304014564bffc6fdee79
Author: Alberto Cortés <alberto@sourced.tech>
Date:   2015-12-11 17:57:10 +0100 +0100
...

Retrieving the latest commit for a given repository:

r, err := git.NewRepository("https://github.com/src-d/go-git", nil)
if err != nil {
	panic(err)
}

if err := r.PullDefault(); err != nil {
	panic(err)
}

hash, err := r.Remotes[git.DefaultRemoteName].Head()
if err != nil {
	panic(err)
}

commit, err := r.Commit(hash)
if err != nil {
	panic(err)
}

fmt.Println(commit)

Creating a repository from an ordinary local git directory (that has been previously prepared by running git gc on it).

// Download any git repository and prepare it as as follows:
//
//   $ git clone https://github.com/src-d/go-git /tmp/go-git
//   $ pushd /tmp/go-git ; git gc ; popd
//
// Then, create a go-git repository from the local content
// and print its commits as follows:

package main

import (
	"fmt"
	"io"

	"gopkg.in/src-d/go-git.v3"
	"gopkg.in/src-d/go-git.v3/utils/fs"
)

func main() {
	fs := fs.NewOS() // a simple proxy for the local host filesystem
	path := "/tmp/go-git/.git"

	repo, err := git.NewRepositoryFromFS(fs, path)
	if err != nil {
		panic(err)
	}

	iter, err := repo.Commits()
	if err != nil {
		panic(err)
	}
	defer iter.Close()

	for {
		commit, err := iter.Next()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}

		fmt.Println(commit)
	}
}

Implementing your own filesystem will let you access repositories stored on remote services (e.g. amazon S3), see the examples directory for a simple filesystem implementation and usage.

Wrapping

go-git can be wrapped into any language which supports shared library interop. Python wrapper already exists. This is provided by "cshared" cgo files which can be built with go build -o libgogit.so -buildmode=c-shared github.com/src-d/go-git/cshared.

Acknowledgements

The earlier versions of the packfile reader are based on git-chain, project done by @yrashk

License

MIT, see LICENSE

Documentation

Overview

Package git is a low level and highly extensible git client library for reading repositories from git servers. It is written in Go from scratch, without any C dependencies.

We have been following the open/close principle in its design to facilitate extensions.

Small example extracting the commits from a repository:

func ExampleBasic_printCommits() {
  r, err := git.NewRepository("https://github.com/src-d/go-git", nil)
  if err != nil {
  	panic(err)
  }

  if err := r.Pull("origin", "refs/heads/master"); err != nil {
  	panic(err)
  }

  iter := r.Commits()
  defer iter.Close()

  for {
  	commit, err := iter.Next()
  	if err != nil {
  		if err == io.EOF {
  			break
  		}

  		panic(err)
  	}

  	fmt.Println(commit)
  }
}

Index

Constants

View Source
const (
	// DefaultRemoteName name of the default Remote, just like git command
	DefaultRemoteName = "origin"
)

Variables

View Source
var (
	ErrMaxTreeDepth = errors.New("maximum tree depth exceeded")
	ErrFileNotFound = errors.New("file not found")
)

New errors defined by this package.

View Source
var (
	// ErrObjectNotFound object not found
	ErrObjectNotFound = errors.New("object not found")
)
View Source
var ErrUnsupportedObject = errors.New("unsupported object type")

ErrUnsupportedObject trigger when a non-supported object is being decoded.

Functions

func SortCommits

func SortCommits(l []*Commit)

SortCommits sort a commit list by commit date, from older to newer.

Types

type Blame

type Blame struct {
	Path  string
	Rev   core.Hash
	Lines []*line
}

type Blob

type Blob struct {
	Hash core.Hash
	Size int64
	// contains filtered or unexported fields
}

Blob is used to store file data - it is generally a file.

func (*Blob) Decode

func (b *Blob) Decode(o core.Object) error

Decode transforms a core.Object into a Blob struct.

func (*Blob) ID

func (b *Blob) ID() core.Hash

ID returns the object ID of the blob. The returned value will always match the current value of Blob.Hash.

ID is present to fulfill the Object interface.

func (*Blob) Reader

func (b *Blob) Reader() (core.ObjectReader, error)

Reader returns a reader allow the access to the content of the blob

func (*Blob) Type

func (b *Blob) Type() core.ObjectType

Type returns the type of object. It always returns core.BlobObject.

Type is present to fulfill the Object interface.

type Commit

type Commit struct {
	Hash      core.Hash
	Author    Signature
	Committer Signature
	Message   string
	// contains filtered or unexported fields
}

Commit points to a single tree, marking it as what the project looked like at a certain point in time. It contains meta-information about that point in time, such as a timestamp, the author of the changes since the last commit, a pointer to the previous commit(s), etc. http://schacon.github.io/gitbook/1_the_git_object_model.html

func (*Commit) Blame

func (c *Commit) Blame(path string) (*Blame, error)

Blame returns the last commit that modified each line of a file in a repository.

The file to blame is identified by the input arguments: repo, commit and path. The output is a slice of commits, one for each line in the file.

Blaming a file is a two step process:

1. Create a linear history of the commits affecting a file. We use revlist.New for that.

2. Then build a graph with a node for every line in every file in the history of the file.

Each node (line) holds the commit where it was introduced or last modified. To achieve that we use the FORWARD algorithm described in Zimmermann, et al. "Mining Version Archives for Co-changed Lines", in proceedings of the Mining Software Repositories workshop, Shanghai, May 22-23, 2006.

Each node is assigned a commit: Start by the nodes in the first commit. Assign that commit as the creator of all its lines.

Then jump to the nodes in the next commit, and calculate the diff between the two files. Newly created lines get assigned the new commit as its origin. Modified lines also get this new commit. Untouched lines retain the old commit.

All this work is done in the assignOrigin function which holds all the internal relevant data in a "blame" struct, that is not exported.

TODO: ways to improve the efficiency of this function:

1. Improve revlist

2. Improve how to traverse the history (example a backward traversal will be much more efficient)

TODO: ways to improve the function in general:

1. Add memoization between revlist and assign.

2. It is using much more memory than needed, see the TODOs below.

func (*Commit) Decode

func (c *Commit) Decode(o core.Object) (err error)

Decode transforms a core.Object into a Commit struct.

func (*Commit) File

func (c *Commit) File(path string) (file *File, err error)

File returns the file with the specified "path" in the commit and a nil error if the file exists. If the file does not exist, it returns a nil file and the ErrFileNotFound error.

func (*Commit) ID

func (c *Commit) ID() core.Hash

ID returns the object ID of the commit. The returned value will always match the current value of Commit.Hash.

ID is present to fulfill the Object interface.

func (*Commit) NumParents

func (c *Commit) NumParents() int

NumParents returns the number of parents in a commit.

func (*Commit) Parents

func (c *Commit) Parents() *CommitIter

Parents return a CommitIter to the parent Commits

func (*Commit) References

func (c *Commit) References(path string) ([]*Commit, error)

References returns a References for the file at "path", the commits are sorted in commit order. It stops searching a branch for a file upon reaching the commit were the file was created.

Caveats:

  • Moves and copies are not currently supported.
  • Cherry-picks are not detected unless there are no commits between them and therefore can appear repeated in the list. (see git path-id for hints on how to fix this).

func (*Commit) String

func (c *Commit) String() string

func (*Commit) Tree

func (c *Commit) Tree() *Tree

Tree returns the Tree from the commit

func (*Commit) Type

func (c *Commit) Type() core.ObjectType

Type returns the type of object. It always returns core.CommitObject.

Type is present to fulfill the Object interface.

type CommitIter

type CommitIter struct {
	core.ObjectIter
	// contains filtered or unexported fields
}

CommitIter provides an iterator for a set of commits.

func NewCommitIter

func NewCommitIter(r *Repository, iter core.ObjectIter) *CommitIter

NewCommitIter returns a CommitIter for the given repository and underlying object iterator.

The returned CommitIter will automatically skip over non-commit objects.

func (*CommitIter) Next

func (iter *CommitIter) Next() (*Commit, error)

Next moves the iterator to the next commit and returns a pointer to it. If it has reached the end of the set it will return io.EOF.

type File

type File struct {
	Name string
	Mode os.FileMode
	Blob
}

File represents git file objects.

func (*File) Contents

func (f *File) Contents() (content string, err error)

Contents returns the contents of a file as a string.

func (*File) Lines

func (f *File) Lines() ([]string, error)

Lines returns a slice of lines from the contents of a file, stripping all end of line characters. If the last line is empty (does not end in an end of line), it is also stripped.

type FileIter

type FileIter struct {
	// contains filtered or unexported fields
}

func NewFileIter

func NewFileIter(r *Repository, t *Tree) *FileIter

func (*FileIter) Close

func (iter *FileIter) Close()

func (*FileIter) Next

func (iter *FileIter) Next() (*File, error)

type Hash

type Hash core.Hash

Hash hash of an object

type Object

type Object interface {
	ID() core.Hash
	Type() core.ObjectType
	Decode(core.Object) error
}

Object is a generic representation of any git object. It is implemented by Commit, Tree, Blob and Tag, and includes the functions that are common to them.

Object is returned when an object could of any type. It is frequently used with a type cast to acquire the specific type of object:

func process(obj Object) {
	switch o := obj.(type) {
	case *Commit:
		// o is a Commit
	case *Tree:
		// o is a Tree
	case *Blob:
		// o is a Blob
	case *Tag:
		// o is a Tag
	}
}

This interface is intentionally different from core.Object, which is a lower level interface used by storage implementations to read and write objects.

type Remote

type Remote struct {
	Endpoint common.Endpoint
	Auth     common.AuthMethod
	// contains filtered or unexported fields
}

Remote represents a connection to a remote repository

func NewAuthenticatedRemote

func NewAuthenticatedRemote(url string, auth common.AuthMethod) (*Remote, error)

NewAuthenticatedRemote returns a new Remote using the given AuthMethod, using as client http.DefaultClient

func NewRemote

func NewRemote(url string) (*Remote, error)

NewRemote returns a new Remote, using as client http.DefaultClient

func (*Remote) Capabilities

func (r *Remote) Capabilities() *common.Capabilities

Capabilities returns the remote capabilities

func (*Remote) Connect

func (r *Remote) Connect() error

Connect with the endpoint

func (*Remote) DefaultBranch

func (r *Remote) DefaultBranch() string

DefaultBranch returns the name of the remote's default branch

func (*Remote) Fetch

Fetch returns a reader using the request

func (*Remote) FetchDefaultBranch

func (r *Remote) FetchDefaultBranch() (io.ReadCloser, error)

FetchDefaultBranch returns a reader for the default branch

func (*Remote) Head

func (r *Remote) Head() (core.Hash, error)

Head returns the Hash of the HEAD

func (*Remote) Info

func (r *Remote) Info() *common.GitUploadPackInfo

Info returns the git-upload-pack info

func (*Remote) Ref

func (r *Remote) Ref(refName string) (core.Hash, error)

Ref returns the Hash pointing the given refName

func (*Remote) Refs

func (r *Remote) Refs() map[string]core.Hash

Refs returns the Hash pointing the given refName

type Repository

type Repository struct {
	Remotes map[string]*Remote
	Storage core.ObjectStorage
}

Repository git repository struct

func NewPlainRepository

func NewPlainRepository() *Repository

NewPlainRepository creates a new repository without remotes

func NewRepository

func NewRepository(url string, auth common.AuthMethod) (*Repository, error)

NewRepository creates a new repository setting remote as default remote

func NewRepositoryFromFS

func NewRepositoryFromFS(fs fs.FS, path string) (*Repository, error)

NewRepositoryFromFS creates a new repository from an standard git repository on disk.

Repositories created like this don't hold a local copy of the original repository objects, instead all queries are resolved by looking at the original repository packfile. This is very cheap in terms of memory and allows to process repositories bigger than your memory.

To be able to use git repositories this way, you must run "git gc" on them beforehand.

func (*Repository) Blob

func (r *Repository) Blob(h core.Hash) (*Blob, error)

Blob returns the blob with the given hash

func (*Repository) Commit

func (r *Repository) Commit(h core.Hash) (*Commit, error)

Commit return the commit with the given hash

func (*Repository) Commits

func (r *Repository) Commits() (*CommitIter, error)

Commits decode the objects into commits

func (*Repository) Object

func (r *Repository) Object(h core.Hash) (Object, error)

Object returns an object with the given hash.

func (*Repository) Pull

func (r *Repository) Pull(remoteName, branch string) (err error)

Pull connect and fetch the given branch from the given remote, the branch should be provided with the full path not only the abbreviation, eg.: "refs/heads/master"

func (*Repository) PullDefault

func (r *Repository) PullDefault() (err error)

PullDefault like Pull but retrieve the default branch from the default remote

func (*Repository) Tag

func (r *Repository) Tag(h core.Hash) (*Tag, error)

Tag returns a tag with the given hash.

func (*Repository) Tags

func (r *Repository) Tags() (*TagIter, error)

Tags returns a TagIter that can step through all of the annotated tags in the repository.

func (*Repository) Tree

func (r *Repository) Tree(h core.Hash) (*Tree, error)

Tree return the tree with the given hash

type Signature

type Signature struct {
	Name  string
	Email string
	When  time.Time
}

Signature represents an action signed by a person

func (*Signature) Decode

func (s *Signature) Decode(b []byte)

Decode decodes a byte slice into a signature

func (*Signature) String

func (s *Signature) String() string

type Tag

type Tag struct {
	Hash       core.Hash
	Name       string
	Tagger     Signature
	Message    string
	TargetType core.ObjectType
	Target     core.Hash
	// contains filtered or unexported fields
}

Tag represents an annotated tag object. It points to a single git object of any type, but tags typically are applied to commit or blob objects. It provides a reference that associates the target with a tag name. It also contains meta-information about the tag, including the tagger, tag date and message.

https://git-scm.com/book/en/v2/Git-Internals-Git-References#Tags

func (*Tag) Blob

func (t *Tag) Blob() (*Blob, error)

Blob returns the blob pointed to by the tag. If the tag points to a different type of object ErrUnsupportedObject will be returned.

func (*Tag) Commit

func (t *Tag) Commit() (*Commit, error)

Commit returns the commit pointed to by the tag. If the tag points to a different type of object ErrUnsupportedObject will be returned.

func (*Tag) Decode

func (t *Tag) Decode(o core.Object) (err error)

Decode transforms a core.Object into a Tag struct.

func (*Tag) ID

func (t *Tag) ID() core.Hash

ID returns the object ID of the tag, not the object that the tag references. The returned value will always match the current value of Tag.Hash.

ID is present to fulfill the Object interface.

func (*Tag) Object

func (t *Tag) Object() (Object, error)

Object returns the object pointed to by the tag.

func (*Tag) String

func (t *Tag) String() string

String returns the meta information contained in the tag as a formatted string.

func (*Tag) Tree

func (t *Tag) Tree() (*Tree, error)

Tree returns the tree pointed to by the tag. If the tag points to a commit object the tree of that commit will be returned. If the tag does not point to a commit or tree object ErrUnsupportedObject will be returned.

func (*Tag) Type

func (t *Tag) Type() core.ObjectType

Type returns the type of object. It always returns core.TagObject.

Type is present to fulfill the Object interface.

type TagIter

type TagIter struct {
	core.ObjectIter
	// contains filtered or unexported fields
}

TagIter provides an iterator for a set of tags.

func NewTagIter

func NewTagIter(r *Repository, iter core.ObjectIter) *TagIter

NewTagIter returns a TagIter for the given repository and underlying object iterator.

The returned TagIter will automatically skip over non-tag objects.

func (*TagIter) Next

func (iter *TagIter) Next() (*Tag, error)

Next moves the iterator to the next tag and returns a pointer to it. If it has reached the end of the set it will return io.EOF.

type Tree

type Tree struct {
	Entries []TreeEntry
	Hash    core.Hash
	// contains filtered or unexported fields
}

Tree is basically like a directory - it references a bunch of other trees and/or blobs (i.e. files and sub-directories)

func (*Tree) Decode

func (t *Tree) Decode(o core.Object) (err error)

Decode transform an core.Object into a Tree struct

func (*Tree) File

func (t *Tree) File(path string) (*File, error)

File returns the hash of the file identified by the `path` argument. The path is interpreted as relative to the tree receiver.

func (*Tree) Files

func (t *Tree) Files() *FileIter

Files returns a FileIter allowing to iterate over the Tree

func (*Tree) ID

func (t *Tree) ID() core.Hash

ID returns the object ID of the tree. The returned value will always match the current value of Tree.Hash.

ID is present to fulfill the Object interface.

func (*Tree) Type

func (t *Tree) Type() core.ObjectType

Type returns the type of object. It always returns core.TreeObject.

type TreeEntry

type TreeEntry struct {
	Name string
	Mode os.FileMode
	Hash core.Hash
}

TreeEntry represents a file

type TreeIter

type TreeIter struct {
	// contains filtered or unexported fields
}

TreeEntryIter facilitates iterating through the descendent subtrees of a Tree.

func NewTreeIter

func NewTreeIter(r *Repository, t *Tree) *TreeIter

NewTreeIter returns a new TreeIter instance

func (*TreeIter) Close

func (iter *TreeIter) Close()

Close closes the TreeIter

func (*TreeIter) Next

func (iter *TreeIter) Next() (*Tree, error)

Next returns the next Tree from the tree.

type TreeWalker

type TreeWalker struct {
	// contains filtered or unexported fields
}

TreeWalker provides a means of walking through all of the entries in a Tree.

func NewTreeWalker

func NewTreeWalker(r *Repository, t *Tree) *TreeWalker

NewTreeWalker returns a new TreeWalker for the given repository and tree.

It is the caller's responsibility to call Close() when finished with the tree walker.

func (*TreeWalker) Close

func (w *TreeWalker) Close()

Close releases any resources used by the TreeWalker.

func (*TreeWalker) Next

func (w *TreeWalker) Next() (name string, entry TreeEntry, obj Object, err error)

Next returns the next object from the tree. Objects are returned in order and subtrees are included. After the last object has been returned further calls to Next() will return io.EOF.

In the current implementation any objects which cannot be found in the underlying repository will be skipped automatically. It is possible that this may change in future versions.

func (*TreeWalker) Tree

func (w *TreeWalker) Tree() *Tree

Tree returns the tree that the tree walker most recently operated on.

Directories

Path Synopsis
Go-git needs the packfile and the refs of the repo.
Go-git needs the packfile and the refs of the repo.
common
Package common contains utils used by the clients
Package common contains utils used by the clients
http
Package http implements a HTTP client for go-git.
Package http implements a HTTP client for go-git.
ssh
Package ssh implements a ssh client for go-git.
Package ssh implements a ssh client for go-git.
Package core implement the core interfaces and structs used by go-git
Package core implement the core interfaces and structs used by go-git
+build ignore
+build ignore
Package diff implements line oriented diffs, similar to the ancient Unix diff command.
Package diff implements line oriented diffs, similar to the ancient Unix diff command.
examples
formats
idxfile
== Original (version 1) pack-*.idx files have the following format:
== Original (version 1) pack-*.idx files have the following format:
packfile
Package packfile documentation:
Package packfile documentation:
storage
utils
fs

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL