Documentation ¶
Overview ¶
Package packfile documentation:
GIT pack format ===============
== pack-*.pack files have the following format:
A header appears at the beginning and consists of the following:
4-byte signature: The signature is: {'P', 'A', 'C', 'K'}
4-byte version number (network byte order): GIT currently accepts version number 2 or 3 but generates version 2 only.
4-byte number of objects contained in the pack (network byte order)
Observation: we cannot have more than 4G versions ;-) and more than 4G objects in a pack.
The header is followed by number of object entries, each of which looks like this:
(undeltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) compressed data
(deltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) 20-byte base object name compressed delta data
Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything.
The trailer records 20-byte SHA1 checksum of all of the above.
== Original (version 1) pack-*.idx files have the following format:
The header consists of 256 4-byte network byte order integers. N-th entry of this table records the number of objects in the corresponding pack, the first byte of whose object name is less than or equal to N. This is called the 'first-level fan-out' table.
The header is followed by sorted 24-byte entries, one entry per object in the pack. Each entry is:
4-byte network byte order integer, recording where the object is stored in the packfile as the offset from the beginning.
20-byte object name.
The file is concluded with a trailer:
A copy of the 20-byte SHA1 checksum at the end of corresponding packfile.
20-byte SHA1-checksum of all of the above.
Pack Idx file:
-- +--------------------------------+
fanout | fanout[0] = 2 (for example) |-. table +--------------------------------+ |
| fanout[1] | | +--------------------------------+ | | fanout[2] | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | fanout[255] = total objects |---. -- +--------------------------------+ | |
main | offset | | | index | object name 00XXXXXXXXXXXXXXXX | | | table +--------------------------------+ | |
| offset | | | | object name 00XXXXXXXXXXXXXXXX | | | +--------------------------------+<+ | .-| offset | | | | object name 01XXXXXXXXXXXXXXXX | | | +--------------------------------+ | | | offset | | | | object name 01XXXXXXXXXXXXXXXX | | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | offset | | | | object name FFXXXXXXXXXXXXXXXX | | --| +--------------------------------+<--+
trailer | | packfile checksum |
| +--------------------------------+ | | idxfile checksum | | +--------------------------------+ .-------. |
Pack file entry: <+
packed object header: 1-byte size extension bit (MSB) type (next 3 bit) size0 (lower 4-bit) n-byte sizeN (as long as MSB is set, each 7-bit) size0..sizeN form 4+7+7+..+7 bit integer, size0 is the least significant part, and sizeN is the most significant part. packed object data: If it is not DELTA, then deflated bytes (the size above is the size before compression). If it is REF_DELTA, then 20-byte base object name SHA1 (the size above is the size of the delta data that follows). delta data, deflated. If it is OFS_DELTA, then n-byte offset (see below) interpreted as a negative offset from the type-byte of the header of the ofs-delta entry (the size above is the size of the delta data that follows). delta data, deflated. offset encoding: n bytes with MSB set in all but the last one. The offset is then the number constructed by concatenating the lower 7 bit of each byte, and for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) to the result.
== Version 2 pack-*.idx files support packs larger than 4 GiB, and
have some other reorganizations. They have the format: - A 4-byte magic number '\377tOc' which is an unreasonable fanout[0] value. - A 4-byte version number (= 2) - A 256-entry fan-out table just like v1. - A table of sorted 20-byte SHA1 object names. These are packed together without offset values to reduce the cache footprint of the binary search for a specific object name. - A table of 4-byte CRC32 values of the packed object data. This is new in v2 so compressed data can be copied directly from pack to pack during repacking without undetected data corruption. - A table of 4-byte offset values (in network byte order). These are usually 31-bit pack file offsets, but large offsets are encoded as an index into the next table with the msbit set. - A table of 8-byte offset entries (empty for pack files less than 2 GiB). Pack files are organized with heavily used objects toward the front, so most object references should not need to refer to this table. - The same trailer as a v1 pack file: A copy of the 20-byte SHA1 checksum at the end of corresponding packfile. 20-byte SHA1-checksum of all of the above.
From: https://www.kernel.org/pub/software/scm/git/docs/v1.7.5/technical/pack-protocol.txt
Index ¶
- Constants
- Variables
- func PatchDelta(src, delta []byte) []byte
- type Decoder
- type Error
- type Format
- type Parser
- func (p *Parser) IsSupportedVersion(v uint32) bool
- func (p Parser) IsValidSignature(sig []byte) bool
- func (p *Parser) ReadCount() (uint32, error)
- func (p Parser) ReadHash() (core.Hash, error)
- func (p Parser) ReadHeader() (uint32, error)
- func (p Parser) ReadNegativeOffset() (int64, error)
- func (p Parser) ReadNonDeltaObjectContent() ([]byte, error)
- func (p Parser) ReadOFSDeltaObjectContent(start int64) ([]byte, core.ObjectType, error)
- func (p Parser) ReadObject() (core.Object, error)
- func (p Parser) ReadObjectTypeAndLength() (core.ObjectType, int64, error)
- func (p Parser) ReadREFDeltaObjectContent() ([]byte, core.ObjectType, error)
- func (p *Parser) ReadSignature() ([]byte, error)
- func (p Parser) ReadSolveDelta(base []byte) ([]byte, error)
- func (p *Parser) ReadVersion() (uint32, error)
- type ReadRecaller
- type Seekable
- func (r *Seekable) ForgetAll()
- func (r *Seekable) Offset() (int64, error)
- func (r *Seekable) Read(p []byte) (int, error)
- func (r *Seekable) ReadByte() (byte, error)
- func (r *Seekable) RecallByHash(h core.Hash) (core.Object, error)
- func (r *Seekable) RecallByOffset(o int64) (obj core.Object, err error)
- func (r *Seekable) Remember(o int64, obj core.Object) error
- type Stream
- func (r *Stream) ForgetAll()
- func (r *Stream) Offset() (int64, error)
- func (r *Stream) Read(p []byte) (n int, err error)
- func (r *Stream) ReadByte() (byte, error)
- func (r *Stream) RecallByHash(h core.Hash) (core.Object, error)
- func (r *Stream) RecallByOffset(o int64) (core.Object, error)
- func (r *Stream) Remember(o int64, obj core.Object) error
Constants ¶
const ( // DefaultMaxObjectsLimit is the maximum amount of objects the // decoder will decode before returning ErrMaxObjectsLimitReached. DefaultMaxObjectsLimit = 1 << 20 )
const (
// VersionSupported is the packfile version supported by this parser.
VersionSupported = 2
)
Variables ¶
var ( // ErrMaxObjectsLimitReached is returned by Decode when the number // of objects in the packfile is higher than // Decoder.MaxObjectsLimit. ErrMaxObjectsLimitReached = NewError("max. objects limit reached") // ErrInvalidObject is returned by Decode when an invalid object is // found in the packfile. ErrInvalidObject = NewError("invalid git object") // ErrPackEntryNotFound is returned by Decode when a reference in // the packfile references and unknown object. ErrPackEntryNotFound = NewError("can't find a pack entry") // ErrZLib is returned by Decode when there was an error unzipping // the packfile contents. ErrZLib = NewError("zlib reading error") )
var ( // ErrEmptyPackfile is returned by ReadHeader when no data is found in the packfile ErrEmptyPackfile = NewError("empty packfile") // ErrBadSignature is returned by ReadHeader when the signature in the packfile is incorrect. ErrBadSignature = NewError("malformed pack file signature") // ErrUnsupportedVersion is returned by ReadHeader when the packfile version is // different than VersionSupported. ErrUnsupportedVersion = NewError("unsupported packfile version") )
var ( // ErrDuplicatedObject is returned by Remember if an object appears several // times in a packfile. ErrDuplicatedObject = NewError("duplicated object") // ErrCannotRecall is returned by RecallByOffset or RecallByHash if the object // to recall cannot be returned. ErrCannotRecall = NewError("cannot recall object") )
Functions ¶
Types ¶
type Decoder ¶
type Decoder struct { // MaxObjectsLimit is the limit of objects to be load in the packfile, if // a packfile excess this number an error is throw, the default value // is defined by DefaultMaxObjectsLimit, usually the default limit is more // than enough to work with any repository, with higher values and huge // repositories you can run out of memory. MaxObjectsLimit uint32 // contains filtered or unexported fields }
Decoder reads and decodes packfiles from an input stream.
func NewDecoder ¶
func NewDecoder(r ReadRecaller) *Decoder
NewDecoder returns a new Decoder that reads from r.
func (*Decoder) Decode ¶
func (d *Decoder) Decode(s core.ObjectStorage) error
Decode reads a packfile and stores it in the value pointed to by s.
type Error ¶
type Error struct {
// contains filtered or unexported fields
}
Error specifies errors returned during packfile parsing.
func (*Error) AddDetails ¶
AddDetails adds details to an error, with additional text.
type Parser ¶
type Parser struct {
ReadRecaller
}
A Parser is a collection of functions to read and process data form a packfile. Values from this type are not zero-value safe. See the NewParser function bellow.
func NewParser ¶
func NewParser(r ReadRecaller) *Parser
NewParser returns a new Parser that reads from the packfile represented by r.
func (*Parser) IsSupportedVersion ¶
IsSupportedVersion returns whether version v is supported by the parser. The current supported version is VersionSupported, defined above.
func (Parser) IsValidSignature ¶
IsValidSignature returns if sig is a valid packfile signature.
func (*Parser) ReadCount ¶
ReadCount reads and returns the count of objects field of a packfile.
func (Parser) ReadHeader ¶
ReadHeader reads the whole packfile header (signature, version and object count). It returns the object count and performs checks on the validity of the signature and the version fields.
func (Parser) ReadNegativeOffset ¶
ReadNegativeOffset reads and returns an offset from a OFS DELTA object entry in a packfile. OFS DELTA offsets are specified in Git VLQ special format:
Ordinary VLQ has some redundancies, example: the number 358 can be encoded as the 2-octet VLQ 0x8166 or the 3-octet VLQ 0x808166 or the 4-octet VLQ 0x80808166 and so forth.
To avoid these redundancies, the VLQ format used in Git removes this prepending redundancy and extends the representable range of shorter VLQs by adding an offset to VLQs of 2 or more octets in such a way that the lowest possible value for such an (N+1)-octet VLQ becomes exactly one more than the maximum possible value for an N-octet VLQ. In particular, since a 1-octet VLQ can store a maximum value of 127, the minimum 2-octet VLQ (0x8000) is assigned the value 128 instead of 0. Conversely, the maximum value of such a 2-octet VLQ (0xff7f) is 16511 instead of just 16383. Similarly, the minimum 3-octet VLQ (0x808000) has a value of 16512 instead of zero, which means that the maximum 3-octet VLQ (0xffff7f) is 2113663 instead of just 2097151. And so forth.
This is how the offset is saved in C:
dheader[pos] = ofs & 127; while (ofs >>= 7) dheader[--pos] = 128 | (--ofs & 127);
func (Parser) ReadNonDeltaObjectContent ¶
ReadNonDeltaObjectContent reads and returns a non-deltified object from it zlib stream in an object entry in the packfile.
func (Parser) ReadOFSDeltaObjectContent ¶
ReadOFSDeltaObjectContent reads an returns an object specified by an OFS-delta entry in the packfile from it negative offset onwards. The start parameter is the offset of this particular object entry (the current offset minus the already processed type and length).
func (Parser) ReadObject ¶
ReadObject reads and returns a git object from an object entry in the packfile. Non-deltified and deltified objects are supported.
func (Parser) ReadObjectTypeAndLength ¶
func (p Parser) ReadObjectTypeAndLength() (core.ObjectType, int64, error)
ReadObjectTypeAndLength reads and returns the object type and the length field from an object entry in a packfile.
func (Parser) ReadREFDeltaObjectContent ¶
func (p Parser) ReadREFDeltaObjectContent() ([]byte, core.ObjectType, error)
ReadREFDeltaObjectContent reads and returns an object specified by a REF-Delta entry in the packfile, form the hash onwards.
func (*Parser) ReadSignature ¶
ReadSignature reads an returns the signature field in the packfile.
func (Parser) ReadSolveDelta ¶
ReadSolveDelta reads and returns the base patched with the contents of a zlib compressed diff data in the delta portion of an object entry in the packfile.
type ReadRecaller ¶
type ReadRecaller interface { // Read reads up to len(p) bytes into p. Read(p []byte) (int, error) // ReadByte is needed because of these: // - https://github.com/golang/go/commit/7ba54d45732219af86bde9a5b73c145db82b70c6 // - https://groups.google.com/forum/#!topic/golang-nuts/fWTRdHpt0QI // - https://gowalker.org/compress/zlib#NewReader ReadByte() (byte, error) // Offset returns the number of bytes parsed so far from the // packfile. Offset() (int64, error) // Remember ask the ReadRecaller to remember the offset and hash for // an object, so you can later call RecallByOffset and RecallByHash. Remember(int64, core.Object) error // ForgetAll forgets all previously remembered objects. ForgetAll() // RecallByOffset returns the previously processed object found at a // given offset. RecallByOffset(int64) (core.Object, error) // RecallByHash returns the previously processed object with the // given hash. RecallByHash(core.Hash) (core.Object, error) }
The ReadRecaller interface has all the functions needed by a packfile Parser to operate. We provide two very different implementations: Seekable and Stream.
type Seekable ¶
type Seekable struct { io.ReadSeeker HashToOffset map[core.Hash]int64 }
Seekable implements ReadRecaller for the io.ReadSeeker of a packfile. Remembering does not actually stores any reference to the remembered objects; the object offset is remebered instead and the packfile is read again everytime a recall operation is requested. This saves memory buy can be very slow if the associated io.ReadSeeker is slow (like a hard disk).
func NewSeekable ¶
func NewSeekable(r io.ReadSeeker) *Seekable
NewSeekable returns a new Seekable that reads form r.
func (*Seekable) ForgetAll ¶
func (r *Seekable) ForgetAll()
ForgetAll forgets all previously remembered objects. For efficiency reasons RecallByOffset always find objects, even if they have been forgetted or were never remembered.
func (*Seekable) Offset ¶
Offset returns the offset for the next Read or ReadByte.
func (*Seekable) Read ¶
Read reads up to len(p) bytes into p.
func (*Seekable) RecallByHash ¶
RecallByHash returns the object for a given hash by looking for it again in the io.ReadeSeerker.
func (*Seekable) RecallByOffset ¶
RecallByOffset returns the object for a given offset by looking for it again in the io.ReadeSeerker. For efficiency reasons, this method always find objects by offset, even if they have not been remembered or if they have been forgetted.
func (*Seekable) Remember ¶
Remember stores the offset of the object and its hash, but not the object itself. This implementation does not check for already stored offsets, as it is too expensive to build this information from an index every time a get operation is performed on the SeekableReadRecaller.
type Stream ¶
Stream implements ReadRecaller for the io.Reader of a packfile. This implementation keeps all remembered objects referenced in maps for quick access.
func NewStream ¶
NewStream returns a new Stream that reads form r.
func (*Stream) ForgetAll ¶
func (r *Stream) ForgetAll()
ForgetAll forgets all previously remembered objects.
func (*Stream) Offset ¶
Offset returns the number of bytes read.
func (*Stream) Read ¶
Read reads up to len(p) bytes into p.
func (*Stream) RecallByHash ¶
RecallByHash returns an object that has been previously Remember-ed by its hash.
func (*Stream) RecallByOffset ¶
RecallByOffset returns an object that has been previously Remember-ed by the offset of its object entry in the packfile.