README ¶
Protobuf specifications and client libraries for Gitaly
This directory was previously hosted at https://gitlab.com/gitlab-org/gitaly-proto. As of Gitaly 1.58.0 and gitaly-proto 1.39.0, all further proto changes will be made here, in
gitaly/proto
.
Gitaly is part of GitLab. It is a server application that uses its own gRPC protocol to communicate with its clients. This repository contains the protocol definition and automatically generated wrapper code for Go and Ruby.
The .proto files define the remote procedure calls for interacting with Gitaly. We keep auto-generated client libraries for Ruby and Go in their respective subdirectories. The list of RPCs can be found here.
Run make proto
from the root of the repository to regenerate the client
libraries after updating .proto files.
See developers.google.com for documentation of the 'proto3' Protocol buffer specification language.
Issues
We have disabled the issue tracker of the gitaly-proto project. Please use the Gitaly issue tracker.
gRPC/Protobuf concepts
The core Protobuf concepts we use are rpc, service and message. We use these to define the Gitaly protocol.
- rpc a function that can be called from the client and that gets executed on the server. Belongs to a service. Can have one of four request/response signatures: message/message (example: get metadata for commit xxx), message/stream (example: get contents of blob xxx), stream/message (example: create new blob with contents xxx), stream/stream (example: git SSH session).
- service a logical group of RPC's.
- message like a JSON object except it has pre-defined types.
- stream an unbounded sequence of messages. In the Ruby clients this looks like an Enumerator.
gRPC provides an implementation framework based on these Protobuf concepts.
- A gRPC server implements one or more services behind a network listener. Example: the Gitaly server application.
- The gRPC toolchain automatically generates client libraries that handle serialization and connection management. Example: the Go client package and Ruby gem in this repository.
- gRPC clients use the client libraries to make remote procedure calls. These clients must decide what network address to reach their gRPC servers on and handle connection reuse: it is possible to spread different gRPC services over multiple connections to the same gRPC server.
- Officially a gRPC connection is called a channel. In the Go gRPC library these channels are called client connections because 'channel' is already a concept in Go itself. In Ruby a gRPC channel is an instance of GRPC::Core::Channel. We use the word 'connection' in this document. The underlying transport of gRPC, HTTP/2, allows multiple remote procedure calls to happen at the same time on a single connection to a gRPC server. In principle, a multi-threaded gRPC client needs only one connection to a gRPC server.
Design decisions
- In Gitaly's case there is one server application https://gitlab.com/gitlab-org/gitaly which implements all services in the protocol.
- In default GitLab installations each Gitaly client interacts with exactly 1 Gitaly server, on the same host, via a Unix domain socket. In a larger installation each Gitaly client will interact with many different Gitaly servers (one per GitLab storage shard) via TCP connections.
- Gitaly uses grpc.Errorf to return meaningful errors to its clients.
- Each RPC
FooBar
has its ownFooBarRequest
andFooBarResponse
message types. Try to keep the structure of these messages as flat as possible. Only add abstractions when they have a practical benefit. - We never make backwards incompatible changes to an RPC that is already implemented on either the client side or server side. Instead we just create a new RPC call and start a deprecation procedure (see below) for the old one.
- It is encouraged to put comments (starting with
//
) in .proto files. Please put comments on their own lines. This will cause them to be treated as documentation by the protoc compiler. - When choosing an RPC name don't use the service name as context.
Good:
service CommitService { rpc CommitExists }
. Bad:service CommitService { rpc Exists }
.
RPC naming conventions
Gitaly-Proto has RPCs that are resource based, for example when querying for a commit. Another class of RPCs are operations, where the result might be empty or one of the RPC error codes but the fact that the operation took place is of importance.
For all RPCs, start the name with a verb, followed by an entity, and if required followed by a further specification. For example:
- GetCommit
- RepackRepositoryIncremental
- CreateRepositoryFromBundle
For resource RPCs the verbs in use are limited to: Get, List, Create, Update,
Delete, or Is. Where both Get and List as verbs denote these operations have no side
effects. These verbs differ in terms of the expected number of results the query
yields. Get queries are limited to one result, and are expected to return one
result to the client. List queries have zero or more results, and generally will
create a gRPC stream for their results. When the Is
verb is used, this RPC
is expected to return a boolean, or an error. For example: IsRepositoryEmpty
.
When an operation based RPC is defined, the verb should map to the first verb in the Git command it represents. Example; FetchRemote.
Note that the current interface defined in this repository does not yet abide fully to these conventions. Newly defined RPCs should, though, so eventually gitaly-proto converges to a common standard.
Common field names and types
As a general principle, remember that Git does not enforce encodings on most data inside repositories, so we can rarely assume data to be a Protobuf "string" (which implies UTF-8).
bytes revision
: for fields that accept any of branch names / tag names / commit ID's. Usesbytes
to be encoding agnostic.string commit_id
: for fields that accept a commit ID.bytes ref
: for fields that accept a refname.bytes path
: for paths inside Git repositories, i.e., inside Gittree
objects.string relative_path
: for paths on disk on a Gitaly server, created by "us" (GitLab the application) instead of the user, we want to use UTF-8, or better, ASCII.
Stream patterns
These are some patterns we already use, or want to use going forward.
Stream response of many small items
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);
message FooBarResponse {
message Item {
// ...
}
repeated Item items = 1;
}
A typical example of an "Item" would be a commit. To avoid the penalty of network IO for each Item we return, we batch them together. You can think of this as a kind of buffered IO at the level of the Item messages. In Go, to ease the bookkeeping you can use gitlab.com/gitlab-org/gitaly/internal/helper/chunker.
Single large item split over multiple messages
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);
message FooBarResponse {
message Header {
// ...
}
oneof payload {
Header header = 1;
bytes data = 2;
}
}
A typical example of a large item would be the contents of a Git blob.
The header might contain the blob OID and the blob size. Only the first
message in the response stream has header
set, all others have data
but no header
.
In the particular case where you're sending back raw binary data from
Go, you can use
gitlab.com/gitlab-org/gitaly/streamio
to turn your gRPC response stream into an io.Writer
.
Note that a number of existing RPC's do not use this pattern exactly; they don't use
oneof
. In practice this creates ambiguity (does the first message contain non-emptydata
?) and encourages complex optimization in the server implementation (trying to squeeze data into the first response message). Usingoneof
avoids this ambiguity.
Many large items split over multiple messages
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);
message FooBarResponse {
message Header {
// ...
}
oneof payload {
Header header = 1;
bytes data = 2;
}
}
This looks the same as the "single large item" case above, except
whenever a new large item begins, we send a new message with a non-empty
header
field.
Footers
If the RPC requires it we can also send a footer using oneof
. But by
default, we prefer headers.
RPC Annotations
In preparation for Gitaly HA, we are now requiring all RPC's to be annotated with an appropriate designation. All methods must contain one of the following lines:
option (op_type).op = ACCESSOR;
- Designates an RPC as being read-only (i.e. side effect free)
option (op_type).op = MUTATOR;
- Designates that an RPC modifies the repository
Failing to designate an RPC correctly will result in a CI error. For example:
--gitaly_out: server.proto: Method ServerInfo missing op_type option
Additionally, all mutator RPC's require additional annotations to clearly indicate what is being modified:
- When an RPC modifies a server-wide resource, the scope should specify
SERVER
. - When an RPC modifies a storage-wide resource, the scope should specify
STORAGE
.- Additionally, every request should contain field marked with
storage
annotation.
- Additionally, every request should contain field marked with
- When an RPC modifies a specific repository, the scope should specify
REPOSITORY
.- Additionally, every RPC with
REPOSITORY
scope, should also specify the target repository and may specify the additional repository.
- Additionally, every RPC with
The target repository represents the location or address of the repository being modified by the operation. This is needed by Praefect (Gitaly HA) in order to properly schedule replications to keep repository replicas up to date.
The target repository annotation marks where the target repository can be
found in the message. The annotation is added near gitaly.Repository
field
(e.g. Repository repository = 1 [(target_repository)=true];
). If annotated field isn't
gitaly.Repository
type then it has to contain field annotated [(repository)=true]
with
correct type. Having separate repository
annotation allows to have same field in child
message annotated as both target_repository
and additional_repository
depending on parent
message.
The additional repository is annotated similarly to target repository but annotation
is named additional_repository
See our examples of valid and invalid proto annotations.
Go Package
If adding new protobuf files, make sure to correctly set the go_package
option
near the top of the file:
option go_package = "gitlab.com/gitlab-org/gitaly-proto/go/gitalypb";
This allows other protobuf files to locate and import the Go generated stubs. If
you forget to add a go_package
option, you may receive an error similar to:
blob.proto is missing the go_package option
Contributing
The CI at https://gitlab.com/gitlab-org/gitaly-proto regenerates the
client libraries to guard against the mistake of updating the .proto
files but not the client libraries. This check uses git diff
to look
for changes. Some of the code in the Go client libraries is sensitive
to implementation details of the Go standard library (specifically,
the output of gzip). Use the same Go version as .gitlab-ci.yml (Go
1.12) when generating new client libraries for a merge request.
Build process
After you change or add a .proto file you need to re-generate the Go and Ruby libraries before committing your change.
# Re-generate Go and Ruby libraries
make generate
How to deprecate an RPC call
See DEPRECATION.md.
Release
This will tag and release the gitaly-proto library, including pushing the gem to rubygems.org
make release version=X.Y.Z
How to manually push the gem
If the release script fails the gem may not be pushed. This is how you can do that after the fact:
# Use a sub-shell to limit scope of 'set -e'
(
set -e
# Replace X.Y.Z with the version you are pushing
GEM_VERSION=X.Y.Z
git checkout v$GEM_VERSION
gem build gitaly.gemspec
gem push gitaly-$GEM_VERSION.gem
)
Directories ¶
Path | Synopsis |
---|---|
go
|
|
internal/cmd/protoc-gen-gitaly
Command protoc-gen-gitaly is designed to be used as a protobuf compiler plugin to verify Gitaly processes are being followed when writing RPC's.
|
Command protoc-gen-gitaly is designed to be used as a protobuf compiler plugin to verify Gitaly processes are being followed when writing RPC's. |