borges

package module
v0.19.0-beta.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2019 License: GPL-3.0 Imports: 38 Imported by: 9

README

borges Build Status codecov.io GitHub version

borges collects and stores Git repositories.

I have always imagined that Paradise will be a kind of library.

Borges is a set of tools for collection and storage of Git repositories at large scale. It is a distributed system, similar to a search engine, that uses a custom repository storage file format and is optimized for saving storage space and keeping repositories up-to-date.

Further reading

From here, you can directly go to getting started.

Also, is recommended to know borges key concepts.

License

GPLv3, see LICENSE

Documentation

Overview

borges archives repositories in a universal git library.

The goal of borges is fetching repositories and maintain them updated. Repositories are arranged in a repository storage where that contains one repository per init commit found.

We define root commit as any commit with no parents (the first commit of a repository). Note that a repository can contain multiple root commits.

For each reference, we define its init commit as the root commit that is reached by following the first parent of each commit in the history. This is the commit that would be obtained with:

$ git rev-list --first-parent <ref> | tail -n 1

When borges fetches a repository, it groups all references by init commit and pushes each group of references to a repository for its init commit.

Index

Constants

View Source
const (
	FetchRefSpec = config.RefSpec("refs/*:refs/*")
	FetchHEAD    = config.RefSpec("HEAD:refs/heads/HEAD")
)
View Source
const TemporaryError = "temporary"

Variables

View Source
var (
	ErrCleanRepositoryDir      = errors.NewKind("cleaning up local repo dir failed")
	ErrClone                   = errors.NewKind("cloning %s failed")
	ErrPushToRootedRepository  = errors.NewKind("push to rooted repo %s failed")
	ErrArchivingRoots          = errors.NewKind("archiving %d out of %d roots failed: %s")
	ErrEndpointsEmpty          = errors.NewKind("endpoints is empty")
	ErrRepositoryIDNotFound    = errors.NewKind("repository id not found: %s")
	ErrChanges                 = errors.NewKind("error computing changes")
	ErrAlreadyFetching         = errors.NewKind("repository %s was already in a fetching status")
	ErrSetStatus               = errors.NewKind("unable to set repository to status: %s")
	ErrFatal                   = errors.NewKind("fatal, %v: stacktrace: %s")
	ErrCannotProcessRepository = errors.NewKind("cannot process repository")
	ErrProcessedWithErrors     = errors.NewKind("repository processed with errors")
	ErrEmptySiva               = errors.NewKind("unexpected empty siva file found")

	// StrRemoveTmpFiles is the string used to log when tmp files could not
	// be deleted.
	StrRemoveTmpFiles = "could not remove tmp files"
)
View Source
var (
	// ErrObjectTypeNotSupported returned by ResolveCommit when the referenced
	// object isn't a Commit nor a Tag.
	ErrObjectTypeNotSupported = errors.NewKind("object type %q not supported")
)

Functions

func CopyFile added in v0.19.0

func CopyFile(
	dst string,
	dstFS billy.Filesystem,
	src string,
	srcFS billy.Filesystem,
	mode os.FileMode,
) error

CopyFile makes a file copy with the specified permission.

func RecursiveCopy added in v0.19.0

func RecursiveCopy(
	dst string,
	dstFS billy.Filesystem,
	src string,
	srcFS billy.Filesystem,
) error

RecursiveCopy copies a directory to a destination path. It creates all needed directories if destination path does not exist.

func RepositoryID

func RepositoryID(endpoints []string, isFork *bool, storer RepositoryStore) (uuid.UUID, error)

RepositoryID tries to find a repository by the endpoint into the database. If no repository is found, it creates a new one and returns the ID.

func ResolveCommit

func ResolveCommit(r *git.Repository, h plumbing.Hash) (*object.Commit, error)

ResolveCommit gets the hash of a commit that is referenced by a tag, per example. The only resolvable objects are Tags and Commits, if the object is not one of them, this method will return an ErrObjectTypeNotSupported. The output hash always will be a Commit hash.

func StoreConfig added in v0.6.0

func StoreConfig(r *git.Repository, mr *model.Repository) error

Types

type Action

type Action string
const (
	Create  Action = "create"
	Update  Action = "update"
	Delete  Action = "delete"
	Invalid Action = "invalid"
)

type Archiver

type Archiver struct {

	// TemporaryCloner is used to clone repositories into temporary storage.
	TemporaryCloner TemporaryCloner
	// Timeout is the deadline to cancel a job.
	Timeout time.Duration
	// Store is the component where repository models are stored.
	Store RepositoryStore
	// RootedTransactioner is used to push new references to our repository
	// storage.
	RootedTransactioner repository.RootedTransactioner
	// LockSession is a locker service to prevent concurrent access to the same
	// rooted reporitories.
	LockSession lock.Session
	// Copier has the same copier struct as RootedTransactioner. Used to
	// directly copy sivas to remote.
	Copier *repository.Copier
	// contains filtered or unexported fields
}

Archiver archives repositories. Archiver instances are thread-safe and can be reused.

See borges documentation for more details about the archiving rules.

func (*Archiver) Do

func (a *Archiver) Do(ctx context.Context, j *Job) error

Do archives a repository according to a job.

type Changes

type Changes map[model.SHA1][]*Command

Changes represents several actions to realize to our root repositories. The map key is the hash of a init commit, and the value is a slice of Command that can be add a new reference, delete a reference or update the hash a reference points to.

func NewChanges

func NewChanges(old, new Referencer) (Changes, error)

NewChanges returns Changes needed to obtain the current state of the repository from a set of old references. The Changes could be create, update or delete. If an old reference has the same name of a new one, but the init commit is different, then the changes will contain a delete command and a create command. If a new reference has more than one init commit, at least one create command per init commit will be created.

Here are all possible cases for up to one reference. We use the notation a<11,01> to refer to reference 'a', pointing to hash '11' with initial commit '01'.

Old		New		Changes
---		---		-------
Ø		Ø		Ø
Ø		a<11,01>	01 -> c<a,11>
a<11,01>	Ø		01 -> d<a,11>
a<11,01>	a<12,01>	01 -> u<a,11,12>
a<11,01>	a<11,02>	01 -> d<a,11> | 02 -> c<a,11> (invalid)
a<11,01>	a<12,02>	01 -> d<a,11> | 02 -> c<a,12>

func (Changes) Add

func (c Changes) Add(new *model.Reference)

func (Changes) Delete

func (c Changes) Delete(old *model.Reference)

func (Changes) Update

func (c Changes) Update(old, new *model.Reference)

type Command

type Command struct {
	Old *model.Reference
	New *model.Reference
}

Command is the way to represent a change into a reference. It could be: - Create: A new reference is created - Update: A previous reference is updated. This means its head changes. - Delete: A previous reference does not exist now.

func (*Command) Action

func (c *Command) Action() Action

Action returns the action related to this command depending of his content

type Consumer

type Consumer struct {
	Notifiers struct {
		QueueError func(error)
	}
	WorkerPool *WorkerPool
	Queue      queue.Queue
	// contains filtered or unexported fields
}

Consumer consumes jobs from a queue and uses multiple workers to process them.

func NewConsumer

func NewConsumer(queue queue.Queue, pool *WorkerPool) *Consumer

NewConsumer creates a new consumer.

func (*Consumer) Start

func (c *Consumer) Start() error

Start initializes the consumer and starts it, blocking until it is stopped.

func (*Consumer) Stop

func (c *Consumer) Stop()

Stop stops the consumer. Note that it does not close the underlying queue and worker pool. It blocks until the consumer has actually stopped.

type Executor added in v0.8.0

type Executor struct {
	// contains filtered or unexported fields
}

Executor retrieves jobs from an job iterator and passes them to a worker pool to be executed. Executor acts as a producer-consumer in a single component.

func NewExecutor added in v0.8.0

func NewExecutor(
	q queue.Queue,
	pool *WorkerPool,
	store RepositoryStore, iter JobIter,
) *Executor

NewExecutor creates a new job executor.

func (*Executor) Execute added in v0.8.0

func (p *Executor) Execute() error

Execute will queue all jobs and distribute them across the worker pool for them to be executed.

type Job

type Job struct {
	RepositoryID uuid.UUID
}

Job represents a borges job to fetch and archive a repository.

type JobIter

type JobIter interface {
	io.Closer
	// Next returns the next job. It returns io.EOF if there are no more jobs.
	Next() (*Job, error)
}

JobIter is an iterator of Job.

func NewLineJobIter

func NewLineJobIter(r io.ReadCloser, storer RepositoryStore) JobIter

NewLineJobIter returns a JobIter that returns jobs generated from a reader with a list of repository URLs, one per line.

func NewMentionJobIter

func NewMentionJobIter(q queue.Queue, storer RepositoryStore) JobIter

NewMentionJobIter returns a JobIter that returns jobs generated from mentions received from a queue (e.g. from rovers).

type Producer

type Producer struct {
	// contains filtered or unexported fields
}

Producer is a service to generate jobs and put them to the queue.

func NewProducer

func NewProducer(i JobIter, q queue.Queue, p queue.Priority, jobRetries int) *Producer

NewProducer creates a new producer.

func (*Producer) Start

func (p *Producer) Start()

Start starts the producer services. It blocks until Stop is called.

func (*Producer) Stop

func (p *Producer) Stop()

Stop stops the producer.

type Referencer

type Referencer interface {
	// References retrieves a slice of *model.Reference or an error.
	References() ([]*model.Reference, error)
}

Referencer can retrieve reference models (*model.Reference).

func NewGitReferencer

func NewGitReferencer(r *git.Repository) Referencer

NewGitReferencer takes a *git.Repository and returns a Referencer that retrieves any valid reference from it. Symbolic references and references that do not point to commits (possibly through a tag) are silently ignored. It might return an error if any operation fails in the underlying repository.

func NewModelReferencer

func NewModelReferencer(r *model.Repository) Referencer

NewModelReferencer takes a *model.Repository and returns a Referencer that accesses its references. The resulting Referencer never returns an error.

type RepositoryStore added in v0.14.0

type RepositoryStore interface {
	// Create inserts a new Repository in the store.
	Create(repo *model.Repository) error
	// Get returns a Repository given its ID.
	Get(id kallax.ULID) (*model.Repository, error)
	// GetByEndpoints returns the Repositories that have common endpoints with the
	// list of endpoints passed.
	GetByEndpoints(endpoints ...string) ([]*model.Repository, error)
	// GetRefsByInit returns the References that have the provided Init commit.
	GetRefsByInit(init model.SHA1) ([]*model.Reference, error)
	// InitHasRefs returns true if there is at least one reference with
	// the provided initial commit.
	InitHasRefs(init model.SHA1) (bool, error)
	// SetStatus changes the status of the given repository.
	SetStatus(repo *model.Repository, status model.FetchStatus) error
	// SetEndpoints updates the endpoints of the repository.
	SetEndpoints(repo *model.Repository, endpoints ...string) error
	// UpdateFailed updates the given repository as failed with the given
	// status. No modifications are performed to the repository itself
	// other than setting its status, all the modification to the repo
	// fields must be done before calling this method. That is, changing
	// FetchErrorAt and so on should be done manually before. Refer to the
	// concrete implementation to know what is being updated.
	UpdateFailed(repo *model.Repository, status model.FetchStatus) error
	// Update updates the given repository as successfully fetched.
	// No modifications are performed to the repository other than setting
	// the Fetched status and the time when it was fetched, all other changes
	// should be done to the repo before calling this method. Refer to the
	// concrete implementation to know what is being updated.
	UpdateFetched(repo *model.Repository, fetchedAt time.Time) error
}

RepositoryStore is the access layer to the storage of repositories.

type TemporaryCloner

type TemporaryCloner interface {
	Clone(ctx context.Context, id, url string) (TemporaryRepository, error)
}

func NewTemporaryCloner

func NewTemporaryCloner(tmpFs billy.Filesystem) TemporaryCloner

type TemporaryRepository

type TemporaryRepository interface {
	io.Closer
	Referencer
	Push(ctx context.Context, url string, refspecs []config.RefSpec) error
}

type Worker

type Worker struct {
	// contains filtered or unexported fields
}

Worker is a worker that processes jobs from a channel.

func NewWorker

func NewWorker(logger log.Logger, do WorkerFunc, ch chan *WorkerJob) *Worker

NewWorker creates a new Worker. The first parameter is a WorkerContext that will be passed to the processing function on every call. The second parameter is the processing function itself that will be called for every job. The third parameter is a channel that the worker will consume jobs from.

func (*Worker) IsRunning

func (w *Worker) IsRunning() bool

IsRunning returns true if the worker is running.

func (*Worker) Start

func (w *Worker) Start()

Start processes jobs from the input channel until it is stopped. Start blocks until the worker is stopped or the channel is closed.

func (*Worker) Stop

func (w *Worker) Stop(immediate bool)

Stop stops the worker, but does not wait until it

type WorkerFunc added in v0.13.0

type WorkerFunc func(context.Context, log.Logger, *Job) error

WorkerFunc is the function the workers will execute.

type WorkerJob

type WorkerJob struct {
	*Job
	// contains filtered or unexported fields
}

A WorkerJob is a job to be passed to the worker. It contains the Job itself and an acknowledger that the worker uses to signal that it finished the job.

type WorkerPool

type WorkerPool struct {
	// contains filtered or unexported fields
}

WorkerPool is a pool of workers that can process jobs.

func NewArchiverWorkerPool

func NewArchiverWorkerPool(
	r RepositoryStore, tx repository.RootedTransactioner,
	tc TemporaryCloner,
	ls lock.Service,
	timeout time.Duration,
	lockingTimeout time.Duration,
	copier *repository.Copier,
) *WorkerPool

NewArchiverWorkerPool creates a new WorkerPool that uses an Archiver to process jobs. It takes optional start, stop and warn notifier functions that are equal to the Archiver notifiers but with additional WorkerContext.

func NewWorkerPool

func NewWorkerPool(f WorkerFunc) *WorkerPool

NewWorkerPool creates a new empty worker pool. It takes a function to be used by workers to process jobs. The pool is started with no workers. SetWorkerCount must be called to start them.

func (*WorkerPool) Close

func (wp *WorkerPool) Close() error

Close stops all the workers in the pool and frees resources used by it waiting until all the current jobs finish.

func (*WorkerPool) Do

func (wp *WorkerPool) Do(j *WorkerJob)

Do executes a job. It blocks until a worker is assigned to process the job and then it returns, with the worker processing the job asynchronously.

func (*WorkerPool) Len

func (wp *WorkerPool) Len() int

Len returns the number of workers currently in the pool.

func (*WorkerPool) SetWorkerCount

func (wp *WorkerPool) SetWorkerCount(workers int)

SetWorkerCount changes the number of running workers. Workers will be started or stopped as necessary to satisfy the new worker count. It blocks until the all required workers are started or stopped. Each worker, if busy, will finish its current job before stopping.

func (*WorkerPool) Stop added in v0.13.0

func (wp *WorkerPool) Stop() error

Stop stops all the workers in the pool and frees the resources used by it as well as stopping the workers and their current jobs.

Directories

Path Synopsis
cli
Package lock provides implementations for cancellable and distributed locks.
Package lock provides implementations for cancellable and distributed locks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL