gitbase

package module
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 9, 2018 License: Apache-2.0 Imports: 29 Imported by: 5

README

gitbase Build Status codecov GoDoc

Query git repositories with a MySQL interface.

Installation

Check the Releases page to download the gitbase binary.

Installing from source

Because gitbase uses bblfsh's client-go, which uses cgo, you need to install some dependencies by hand instead of just using go get.

go get github.com/src-d/gitbase/...
cd $GOPATH/src/github.com/src-d/gitbase
make dependencies

Usage

Usage:
  gitbase [OPTIONS] <server | version>

Help Options:
  -h, --help  Show this help message

Available commands:
  server   Start SQL server.
  version  Show the version information.

You can start a server using some repositores from /path/to/repositories with this command:

$ gitbase server -v -g /path/to/repositories

A MySQL client is needed to connect to the server. For example:

$ mysql -q -u root -h 127.0.0.1
MySQL [(none)]> SELECT hash, author_email, author_name FROM commits LIMIT 2;
SELECT hash, author_email, author_name FROM commits LIMIT 2;
+------------------------------------------+---------------------+-----------------------+
| hash                                     | author_email        | author_name           |
+------------------------------------------+---------------------+-----------------------+
| 003dc36e0067b25333cb5d3a5ccc31fd028a1c83 | user1@test.io       | Santiago M. Mola      |
| 01ace9e4d144aaeb50eb630fed993375609bcf55 | user2@test.io       | Antonio Navarro Perez |
+------------------------------------------+---------------------+-----------------------+
2 rows in set (0.01 sec)
Environment variables
Name Description
BBLFSH_ENDPOINT bblfshd endpoint, default "127.0.0.1:9432"
GITBASE_BLOBS_MAX_SIZE maximum blob size to return in MiB, default 5 MiB
GITBASE_BLOBS_ALLOW_BINARY enable retrieval of binary blobs, default false
GITBASE_UNSTABLE_SQUASH_ENABLE UNSTABLE check Unstable features
GITBASE_SKIP_GIT_ERRORS do not stop queries on git errors, default disabled

Tables

You can execute the SHOW TABLES statement to get a list of the available tables. To get all the columns and types of a specific table, you can write DESCRIBE TABLE [tablename].

gitbase exposes the following tables:

Name Columns
repositories id
remotes repository_id, name, push_url, fetch_url, push_refspec, fetch_refspec
commits hash, author_name, author_email, author_when, committer_name, committer_email, committer_when, message, tree_hash
blobs hash, size, content
refs repository_id, name, hash
tree_entries tree_hash, entry_hash, mode, name
references repository_id, name, hash

Functions

To make some common tasks easier for the user, there are some functions to interact with the previous mentioned tables:

Name Description
commit_has_blob(commit_hash,blob_hash)bool get if the specified commit contains the specified blob
commit_has_tree(commit_hash,tree_hash)bool get if the specified commit contains the specified tree
history_idx(start_hash, target_hash)int get the index of a commit in the history of another commit
is_remote(reference_name)bool check if the given reference name is from a remote one
is_tag(reference_name)bool check if the given reference name is a tag
language(path, [blob])text gets the language of a file given its path and the optional content of the file
uast(blob, [lang, [xpath]])json_blob returns an array of UAST nodes as blobs
uast_xpath(json_blob, xpath) performs an XPath query over the given UAST nodes

Unstable features

  • Table squashing: there is an optimization that collects inner joins between tables with a set of supported conditions and converts them into a single node that retrieves the data in chained steps (getting first the commits and then the blobs of every commit instead of joinin all commits and all blobs, for example). It can be enabled with the environment variable GITBASE_UNSTABLE_SQUASH_ENABLE.

Examples

Get all the HEAD references from all the repositories
SELECT * FROM refs WHERE name = 'HEAD'
Commits that appears in more than one reference
SELECT * FROM (
	SELECT COUNT(c.hash) AS num, c.hash
	FROM refs r
	INNER JOIN commits c
		ON history_idx(r.hash, c.hash) >= 0
	GROUP BY c.hash
) t WHERE num > 1
Get the number of blobs per HEAD commit
SELECT COUNT(c.hash), c.hash
FROM refs r
INNER JOIN commits c
	ON r.name = 'HEAD' AND history_idx(r.hash, c.hash) >= 0
INNER JOIN blobs b
	ON commit_has_blob(c.hash, b.hash)
GROUP BY c.hash
Get commits per commiter, per month in 2015
SELECT COUNT(*) as num_commits, month, repo_id, committer_email
	FROM (
		SELECT
			MONTH(committer_when) as month,
			r.id as repo_id,
			committer_email
		FROM repositories r
		INNER JOIN refs ON refs.repository_id = r.id AND refs.name = 'HEAD'
		INNER JOIN commits c ON YEAR(committer_when) = 2015 AND history_idx(refs.hash, c.hash) >= 0
	) as t
GROUP BY committer_email, month, repo_id

License

gitbase is licensed under the Apache 2.0 License.

Documentation

Index

Constants

View Source
const (
	// ReferencesTableName is the name of the refs table.
	ReferencesTableName = "refs"
	// CommitsTableName is the name of the commits table.
	CommitsTableName = "commits"
	// BlobsTableName is the name of the blobs table.
	BlobsTableName = "blobs"
	// TreeEntriesTableName is the name of the tree entries table.
	TreeEntriesTableName = "tree_entries"
	// RepositoriesTableName is the name of the repositories table.
	RepositoriesTableName = "repositories"
	// RemotesTableName is the name of the remotes table.
	RemotesTableName = "remotes"
)

Variables

View Source
var BlobsSchema = sql.Schema{
	{Name: "hash", Type: sql.Text, Nullable: false, Source: BlobsTableName},
	{Name: "size", Type: sql.Int64, Nullable: false, Source: BlobsTableName},
	{Name: "content", Type: sql.Blob, Nullable: false, Source: BlobsTableName},
}

BlobsSchema is the schema for the blobs table.

View Source
var CommitsSchema = sql.Schema{
	{Name: "hash", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "author_name", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "author_email", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "author_when", Type: sql.Timestamp, Nullable: false, Source: CommitsTableName},
	{Name: "committer_name", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "committer_email", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "committer_when", Type: sql.Timestamp, Nullable: false, Source: CommitsTableName},
	{Name: "message", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "tree_hash", Type: sql.Text, Nullable: false, Source: CommitsTableName},
	{Name: "parents", Type: sql.Array(sql.Text), Nullable: false, Source: CommitsTableName},
}

CommitsSchema is the schema for the commits table.

View Source
var ErrBblfshConnection = errors.NewKind("unable to establish a new bblfsh connection")

ErrBblfshConnection is returned when it's impossible to connect to bblfsh.

View Source
var ErrInvalidContext = errors.NewKind("invalid context received: %v")

ErrInvalidContext is returned when some node expected an sql.Context with gitbase session but received something else.

View Source
var ErrInvalidGitbaseSession = errors.NewKind("expecting gitbase session, but received: %T")

ErrInvalidGitbaseSession is returned when some node expected a gitbase session but received something else.

View Source
var ErrSessionCanceled = errors.NewKind("session canceled")

ErrSessionCanceled is returned when session context is canceled

View Source
var RefsSchema = sql.Schema{
	{Name: "repository_id", Type: sql.Text, Nullable: false, Source: ReferencesTableName},
	{Name: "name", Type: sql.Text, Nullable: false, Source: ReferencesTableName},
	{Name: "hash", Type: sql.Text, Nullable: false, Source: ReferencesTableName},
}

RefsSchema is the schema for the refs table.

View Source
var RemotesSchema = sql.Schema{
	{Name: "repository_id", Type: sql.Text, Nullable: false, Source: RemotesTableName},
	{Name: "name", Type: sql.Text, Nullable: false, Source: RemotesTableName},
	{Name: "push_url", Type: sql.Text, Nullable: false, Source: RemotesTableName},
	{Name: "fetch_url", Type: sql.Text, Nullable: false, Source: RemotesTableName},
	{Name: "push_refspec", Type: sql.Text, Nullable: false, Source: RemotesTableName},
	{Name: "fetch_refspec", Type: sql.Text, Nullable: false, Source: RemotesTableName},
}

RemotesSchema is the schema for the remotes table.

View Source
var RepositoriesSchema = sql.Schema{
	{Name: "id", Type: sql.Text, Nullable: false, Source: RepositoriesTableName},
}

RepositoriesSchema is the schema for the repositories table.

View Source
var TreeEntriesSchema = sql.Schema{
	{Name: "tree_hash", Type: sql.Text, Nullable: false, Source: TreeEntriesTableName},
	{Name: "entry_hash", Type: sql.Text, Nullable: false, Source: TreeEntriesTableName},
	{Name: "mode", Type: sql.Text, Nullable: false, Source: TreeEntriesTableName},
	{Name: "name", Type: sql.Text, Nullable: false, Source: TreeEntriesTableName},
}

TreeEntriesSchema is the schema for the tree entries table.

Functions

func NewDatabase added in v0.6.0

func NewDatabase(name string) sql.Database

NewDatabase creates a new Database structure and initializes its tables with the given pool

func NewRowRepoIter added in v0.10.0

func NewRowRepoIter(
	ctx *sql.Context,
	iter RowRepoIter,
) (sql.RowIter, error)

NewRowRepoIter initializes a new repository iterator.

* ctx: it should contain a gitbase.Session * iter: specific RowRepoIter interface

  • NewIterator: called when a new repository is about to be iterated, returns a new RowRepoIter
  • Next: called for each row
  • Close: called when a repository finished iterating

func NewSessionBuilder added in v0.10.0

func NewSessionBuilder(pool *RepositoryPool, opts ...SessionOption) server.SessionBuilder

NewSessionBuilder creates a SessionBuilder with the given Repository Pool.

Types

type BlobsIter added in v0.11.0

type BlobsIter interface {
	ChainableIter
}

BlobsIter is a chainable iterator that operates on blobs.

func NewCommitBlobsIter added in v0.11.0

func NewCommitBlobsIter(
	commits CommitsIter,
	filters sql.Expression,
	readContent bool,
) BlobsIter

NewCommitBlobsIter returns an iterator that will return all blobs for the commit in the given iter that match the given filters.

func NewTreeEntryBlobsIter added in v0.11.0

func NewTreeEntryBlobsIter(
	treeEntriesIter TreeEntriesIter,
	filters sql.Expression,
	readContent bool,
) BlobsIter

NewTreeEntryBlobsIter returns an iterator that will return all blobs for the tree entries in the given iter that match the given filters.

type ChainableIter added in v0.11.0

type ChainableIter interface {
	// New creates a new Chainable Iterator.
	New(*sql.Context, *Repository) (ChainableIter, error)
	// Close closes the iterator.
	Close() error
	// Row returns the current row. All calls to Row return the same row
	// until another call to Advance. Advance should be called before
	// calling Row.
	Row() sql.Row
	// Advance advances the position of the iterator by one. After io.EOF
	// or any other error, this method should not be called.
	Advance() error
	// Schema returns the schema of the rows returned by this iterator.
	Schema() sql.Schema
}

ChainableIter is an iterator meant to have a chaining-friendly API.

type CommitsIter added in v0.11.0

type CommitsIter interface {
	ChainableIter
	// Commit returns the current repository. All calls to Commit return the
	// same commit until another call to Advance. Advance should
	// be called before calling Commit.
	Commit() *object.Commit
}

CommitsIter is a chainable iterator that operates on commits.

func NewAllCommitsIter added in v0.11.0

func NewAllCommitsIter(filters sql.Expression) CommitsIter

NewAllCommitsIter returns an iterator that will return all commits that match the given filters.

func NewRefCommitsIter added in v0.11.0

func NewRefCommitsIter(
	refsIter RefsIter,
	filters sql.Expression,
) CommitsIter

NewRefCommitsIter returns an iterator that will return all commits for the given iter references that match the given filters. If the iterator is virtual, it will not append its columns to the final row.

func NewRefHEADCommitsIter added in v0.11.0

func NewRefHEADCommitsIter(
	refsIter RefsIter,
	filters sql.Expression,
	virtual bool,
) CommitsIter

NewRefHEADCommitsIter returns an iterator that will return the commit for the given iter reference heads that match the given filters.

type Database added in v0.6.0

type Database struct {
	// contains filtered or unexported fields
}

Database holds all git repository tables

func (*Database) Name added in v0.6.0

func (d *Database) Name() string

Name returns the name of the database

func (*Database) Tables added in v0.6.0

func (d *Database) Tables() map[string]sql.Table

Tables returns a map with all initialized tables

type Ref added in v0.11.0

type Ref struct {
	RepoID string
	*plumbing.Reference
}

Ref is a git reference with the repo id.

type RefsIter added in v0.11.0

type RefsIter interface {
	ChainableIter
	// Ref returns the current repository. All calls to Ref return the
	// same reference until another call to Advance. Advance should
	// be called before calling Ref.
	Ref() *Ref
}

RefsIter is a chainable iterator that operates on references.

func NewAllRefsIter added in v0.11.0

func NewAllRefsIter(filters sql.Expression) RefsIter

NewAllRefsIter returns an iterator that will return all references that match the given filters.

func NewRemoteRefsIter added in v0.11.0

func NewRemoteRefsIter(
	remotesIter RemotesIter,
	filters sql.Expression,
) RefsIter

NewRemoteRefsIter returns an iterator that will return all references for the remotes returned by the given remotes iterator that match the given filters.

func NewRepoRefsIter added in v0.11.0

func NewRepoRefsIter(
	reposIter ReposIter,
	filters sql.Expression,
) RefsIter

NewRepoRefsIter returns an iterator that will return all references for the repositories of the given repos iterator that match the given filters.

type Remote added in v0.11.0

type Remote struct {
	RepoID string
	Name   string
	URL    string
	Fetch  string
}

Remote is the info of a single repository remote.

type RemotesIter added in v0.11.0

type RemotesIter interface {
	ChainableIter
	// Remote returns the current repository. All calls to Remote return the
	// same remote until another call to Advance. Advance should
	// be called before calling Remote.
	Remote() *Remote
}

RemotesIter is a chainable iterator that operates with remotes.

func NewAllRemotesIter added in v0.11.0

func NewAllRemotesIter(filters sql.Expression) RemotesIter

NewAllRemotesIter returns an iterator that will return all remotes that match the given filters.

func NewRepoRemotesIter added in v0.11.0

func NewRepoRemotesIter(reposIter ReposIter, filters sql.Expression) RemotesIter

NewRepoRemotesIter returns an iterator that will return all remotes for the given ReposIter repositories that match the given filters.

type ReposIter added in v0.11.0

type ReposIter interface {
	ChainableIter
	// Repo returns the current repository. All calls to Repo return the
	// same repository until another call to Advance. Advance should
	// be called before calling Repo.
	Repo() *Repository
}

ReposIter is a chainable iterator that operates with repositories.

func NewAllReposIter added in v0.11.0

func NewAllReposIter(filters sql.Expression) ReposIter

NewAllReposIter returns an iterator that will return all repositories that match the given filters.

type Repository added in v0.10.0

type Repository struct {
	ID   string
	Repo *git.Repository
}

Repository struct holds an initialized repository and its ID

func NewRepository added in v0.10.0

func NewRepository(id string, repo *git.Repository) *Repository

NewRepository creates and initializes a new Repository structure

func NewRepositoryFromPath added in v0.10.0

func NewRepositoryFromPath(id, path string) (*Repository, error)

NewRepositoryFromPath creates and initializes a new Repository structure and initializes a go-git repository

func NewSivaRepositoryFromPath added in v0.11.0

func NewSivaRepositoryFromPath(id, path string) (*Repository, error)

NewSivaRepositoryFromPath creates and initializes a new Repository structure and initializes a go-git repository backed by a siva file.

type RepositoryIter added in v0.10.0

type RepositoryIter struct {
	// contains filtered or unexported fields
}

RepositoryIter iterates over all repositories in the pool

func (*RepositoryIter) Close added in v0.10.0

func (i *RepositoryIter) Close() error

Close finished iterator. It's no-op.

func (*RepositoryIter) Next added in v0.10.0

func (i *RepositoryIter) Next() (*Repository, error)

Next retrieves the next Repository. It returns io.EOF as error when there are no more Repositories to retrieve.

type RepositoryPool added in v0.10.0

type RepositoryPool struct {
	// contains filtered or unexported fields
}

RepositoryPool holds a pool git repository paths and functionality to open and iterate them.

func NewRepositoryPool added in v0.10.0

func NewRepositoryPool() *RepositoryPool

NewRepositoryPool initializes a new RepositoryPool

func (*RepositoryPool) Add added in v0.10.0

func (p *RepositoryPool) Add(id, path string, kind repoKind)

Add inserts a new repository in the pool

func (*RepositoryPool) AddDir added in v0.10.0

func (p *RepositoryPool) AddDir(path string) error

AddDir adds all direct subdirectories from path as git repos.

func (*RepositoryPool) AddGit added in v0.10.0

func (p *RepositoryPool) AddGit(path string) (string, error)

AddGit checks if a git repository can be opened and adds it to the pool. It also sets its path as ID.

func (*RepositoryPool) AddSivaDir added in v0.11.0

func (p *RepositoryPool) AddSivaDir(path string) error

AddSivaDir adds to the repository pool all siva files found inside the given directory and in its children directories, but not the children of those directories.

func (*RepositoryPool) GetPos added in v0.10.0

func (p *RepositoryPool) GetPos(pos int) (*Repository, error)

GetPos retrieves a repository at a given position. If the position is out of bounds it returns io.EOF.

func (*RepositoryPool) RepoIter added in v0.10.0

func (p *RepositoryPool) RepoIter() (*RepositoryIter, error)

RepoIter creates a new Repository iterator

type RowRepoIter added in v0.10.0

type RowRepoIter interface {
	NewIterator(*Repository) (RowRepoIter, error)
	Next() (sql.Row, error)
	Close() error
}

RowRepoIter is the interface needed by each iterator implementation

func NewChainableRowRepoIter added in v0.11.0

func NewChainableRowRepoIter(ctx *sql.Context, iter ChainableIter) RowRepoIter

NewChainableRowRepoIter creates a new RowRepoIter from a ChainableIter.

type Session added in v0.10.0

type Session struct {
	sql.Session
	Pool *RepositoryPool

	SkipGitErrors bool
	// contains filtered or unexported fields
}

Session is the custom implementation of a gitbase session.

func NewSession added in v0.10.0

func NewSession(pool *RepositoryPool, opts ...SessionOption) *Session

NewSession creates a new Session. It requires a repository pool and any number of session options can be passed to configure the session.

func (*Session) BblfshClient added in v0.11.0

func (s *Session) BblfshClient() (*bblfsh.Client, error)

BblfshClient returns a BblfshClient.

func (*Session) Close added in v0.11.0

func (s *Session) Close() error

Close implements the io.Closer interface.

type SessionOption added in v0.11.0

type SessionOption func(*Session)

SessionOption is a function that configures the session given some options.

func WithBblfshEndpoint added in v0.11.0

func WithBblfshEndpoint(endpoint string) SessionOption

WithBblfshEndpoint configures the bblfsh endpoint of the session.

func WithSkipGitErrors added in v0.11.0

func WithSkipGitErrors(enabled bool) SessionOption

WithSkipGitErrors changes the behavior with go-git error.

type Table added in v0.11.0

type Table interface {
	sql.Table
	// contains filtered or unexported methods
}

Table represents a gitbase table.

type TreeEntriesIter added in v0.11.0

type TreeEntriesIter interface {
	ChainableIter
	// TreeEntry returns the current repository. All calls to TreeEntry return the
	// same tree entries until another call to Advance. Advance should
	// be called before calling TreeEntry.
	TreeEntry() *TreeEntry
}

TreeEntriesIter is a chainable operator that operates on Tree Entries.

func NewAllTreeEntriesIter added in v0.11.0

func NewAllTreeEntriesIter(filters sql.Expression) TreeEntriesIter

NewAllTreeEntriesIter returns an iterator that will return all tree entries that match the given filters.

func NewCommitMainTreeEntriesIter added in v0.11.0

func NewCommitMainTreeEntriesIter(
	commitsIter CommitsIter,
	filters sql.Expression,
	virtual bool,
) TreeEntriesIter

NewCommitMainTreeEntriesIter returns an iterator that will return all tree entries for the main tree of the commits returned by the given commit iterator that match the given filters.

func NewCommitTreeEntriesIter added in v0.11.0

func NewCommitTreeEntriesIter(
	commitsIter CommitsIter,
	filters sql.Expression,
	virtual bool,
) TreeEntriesIter

NewCommitTreeEntriesIter returns an iterator that will return all tree entries for all trees of the commits returned by the given commit iterator that match the given filters.

type TreeEntry added in v0.11.0

type TreeEntry struct {
	TreeHash plumbing.Hash
	*object.File
}

TreeEntry is a tree entry object.

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL