packfile

package
v3.1.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 4, 2016 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package packfile documentation:

GIT pack format ===============

== pack-*.pack files have the following format:

  • A header appears at the beginning and consists of the following:

    4-byte signature: The signature is: {'P', 'A', 'C', 'K'}

    4-byte version number (network byte order): GIT currently accepts version number 2 or 3 but generates version 2 only.

    4-byte number of objects contained in the pack (network byte order)

    Observation: we cannot have more than 4G versions ;-) and more than 4G objects in a pack.

  • The header is followed by number of object entries, each of which looks like this:

    (undeltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) compressed data

    (deltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) 20-byte base object name compressed delta data

    Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything.

  • The trailer records 20-byte SHA1 checksum of all of the above.

== Original (version 1) pack-*.idx files have the following format:

  • The header consists of 256 4-byte network byte order integers. N-th entry of this table records the number of objects in the corresponding pack, the first byte of whose object name is less than or equal to N. This is called the 'first-level fan-out' table.

  • The header is followed by sorted 24-byte entries, one entry per object in the pack. Each entry is:

    4-byte network byte order integer, recording where the object is stored in the packfile as the offset from the beginning.

    20-byte object name.

  • The file is concluded with a trailer:

    A copy of the 20-byte SHA1 checksum at the end of corresponding packfile.

    20-byte SHA1-checksum of all of the above.

Pack Idx file:

--  +--------------------------------+

fanout | fanout[0] = 2 (for example) |-. table +--------------------------------+ |

    | fanout[1]                      | |
    +--------------------------------+ |
    | fanout[2]                      | |
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
    | fanout[255] = total objects    |---.
--  +--------------------------------+ | |

main | offset | | | index | object name 00XXXXXXXXXXXXXXXX | | | table +--------------------------------+ | |

    | offset                         | | |
    | object name 00XXXXXXXXXXXXXXXX | | |
    +--------------------------------+<+ |
  .-| offset                         |   |
  | | object name 01XXXXXXXXXXXXXXXX |   |
  | +--------------------------------+   |
  | | offset                         |   |
  | | object name 01XXXXXXXXXXXXXXXX |   |
  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
  | | offset                         |   |
  | | object name FFXXXXXXXXXXXXXXXX |   |
--| +--------------------------------+<--+

trailer | | packfile checksum |

| +--------------------------------+
| | idxfile checksum               |
| +--------------------------------+
    .-------.
            |

Pack file entry: <+

 packed object header:
1-byte size extension bit (MSB)
       type (next 3 bit)
       size0 (lower 4-bit)
    n-byte sizeN (as long as MSB is set, each 7-bit)
    size0..sizeN form 4+7+7+..+7 bit integer, size0
    is the least significant part, and sizeN is the
    most significant part.
 packed object data:
    If it is not DELTA, then deflated bytes (the size above
    is the size before compression).
If it is REF_DELTA, then
  20-byte base object name SHA1 (the size above is the
    size of the delta data that follows).
      delta data, deflated.
If it is OFS_DELTA, then
  n-byte offset (see below) interpreted as a negative
    offset from the type-byte of the header of the
    ofs-delta entry (the size above is the size of
    the delta data that follows).
  delta data, deflated.

 offset encoding:
  n bytes with MSB set in all but the last one.
  The offset is then the number constructed by
  concatenating the lower 7 bit of each byte, and
  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
  to the result.

== Version 2 pack-*.idx files support packs larger than 4 GiB, and

 have some other reorganizations.  They have the format:

- A 4-byte magic number '\377tOc' which is an unreasonable
  fanout[0] value.

- A 4-byte version number (= 2)

- A 256-entry fan-out table just like v1.

- A table of sorted 20-byte SHA1 object names.  These are
  packed together without offset values to reduce the cache
  footprint of the binary search for a specific object name.

- A table of 4-byte CRC32 values of the packed object data.
  This is new in v2 so compressed data can be copied directly
  from pack to pack during repacking without undetected
  data corruption.

- A table of 4-byte offset values (in network byte order).
  These are usually 31-bit pack file offsets, but large
  offsets are encoded as an index into the next table with
  the msbit set.

- A table of 8-byte offset entries (empty for pack files less
  than 2 GiB).  Pack files are organized with heavily used
  objects toward the front, so most object references should
  not need to refer to this table.

- The same trailer as a v1 pack file:

  A copy of the 20-byte SHA1 checksum at the end of
  corresponding packfile.

  20-byte SHA1-checksum of all of the above.

From: https://www.kernel.org/pub/software/scm/git/docs/v1.7.5/technical/pack-protocol.txt

Index

Constants

View Source
const (
	// DefaultMaxObjectsLimit is the maximum amount of objects the
	// decoder will decode before returning ErrMaxObjectsLimitReached.
	DefaultMaxObjectsLimit = 1 << 20
)
View Source
const (
	// VersionSupported is the packfile version supported by this parser.
	VersionSupported = 2
)

Variables

View Source
var (
	// ErrMaxObjectsLimitReached is returned by Decode when the number
	// of objects in the packfile is higher than
	// Decoder.MaxObjectsLimit.
	ErrMaxObjectsLimitReached = NewError("max. objects limit reached")

	// ErrInvalidObject is returned by Decode when an invalid object is
	// found in the packfile.
	ErrInvalidObject = NewError("invalid git object")

	// ErrPackEntryNotFound is returned by Decode when a reference in
	// the packfile references and unknown object.
	ErrPackEntryNotFound = NewError("can't find a pack entry")

	// ErrZLib is returned by Decode when there was an error unzipping
	// the packfile contents.
	ErrZLib = NewError("zlib reading error")
)
View Source
var (
	// ErrEmptyPackfile is returned by ReadHeader when no data is found in the packfile
	ErrEmptyPackfile = NewError("empty packfile")
	// ErrBadSignature is returned by ReadHeader when the signature in the packfile is incorrect.
	ErrBadSignature = NewError("malformed pack file signature")
	// ErrUnsupportedVersion is returned by ReadHeader when the packfile version is
	// different than VersionSupported.
	ErrUnsupportedVersion = NewError("unsupported packfile version")
)
View Source
var (
	// ErrDuplicatedObject is returned by Remember if an object appears several
	// times in a packfile.
	ErrDuplicatedObject = NewError("duplicated object")
	// ErrCannotRecall is returned by RecallByOffset or RecallByHash if the object
	// to recall cannot be returned.
	ErrCannotRecall = NewError("cannot recall object")
)

Functions

func PatchDelta

func PatchDelta(src, delta []byte) []byte

PatchDelta returns the result of applying the modification deltas in delta to src.

Types

type Decoder

type Decoder struct {
	// MaxObjectsLimit is the limit of objects to be load in the packfile, if
	// a packfile excess this number an error is throw, the default value
	// is defined by DefaultMaxObjectsLimit, usually the default limit is more
	// than enough to work with any repository, with higher values and huge
	// repositories you can run out of memory.
	MaxObjectsLimit uint32
	// contains filtered or unexported fields
}

Decoder reads and decodes packfiles from an input stream.

func NewDecoder

func NewDecoder(r ReadRecaller) *Decoder

NewDecoder returns a new Decoder that reads from r.

func (*Decoder) Decode

func (d *Decoder) Decode(s core.ObjectStorage) error

Decode reads a packfile and stores it in the value pointed to by s.

type Error

type Error struct {
	// contains filtered or unexported fields
}

Error specifies errors returned during packfile parsing.

func NewError

func NewError(reason string) *Error

NewError returns a new error.

func (*Error) AddDetails

func (e *Error) AddDetails(format string, args ...interface{}) *Error

AddDetails adds details to an error, with additional text.

func (*Error) Error

func (e *Error) Error() string

Error returns a text representation of the error.

type Format

type Format int

Format specifies if the packfile uses ref-deltas or ofs-deltas.

const (
	UnknownFormat Format = iota
	OFSDeltaFormat
	REFDeltaFormat
)

Possible values of the Format type.

type Parser

type Parser struct {
	ReadRecaller
}

A Parser is a collection of functions to read and process data form a packfile. Values from this type are not zero-value safe. See the NewParser function bellow.

func NewParser

func NewParser(r ReadRecaller) *Parser

NewParser returns a new Parser that reads from the packfile represented by r.

func (*Parser) IsSupportedVersion

func (p *Parser) IsSupportedVersion(v uint32) bool

IsSupportedVersion returns whether version v is supported by the parser. The current supported version is VersionSupported, defined above.

func (Parser) IsValidSignature

func (p Parser) IsValidSignature(sig []byte) bool

IsValidSignature returns if sig is a valid packfile signature.

func (*Parser) ReadCount

func (p *Parser) ReadCount() (uint32, error)

ReadCount reads and returns the count of objects field of a packfile.

func (Parser) ReadHash

func (p Parser) ReadHash() (core.Hash, error)

ReadHash reads a hash.

func (Parser) ReadHeader

func (p Parser) ReadHeader() (uint32, error)

ReadHeader reads the whole packfile header (signature, version and object count). It returns the object count and performs checks on the validity of the signature and the version fields.

func (Parser) ReadNegativeOffset

func (p Parser) ReadNegativeOffset() (int64, error)

ReadNegativeOffset reads and returns an offset from a OFS DELTA object entry in a packfile. OFS DELTA offsets are specified in Git VLQ special format:

Ordinary VLQ has some redundancies, example: the number 358 can be encoded as the 2-octet VLQ 0x8166 or the 3-octet VLQ 0x808166 or the 4-octet VLQ 0x80808166 and so forth.

To avoid these redundancies, the VLQ format used in Git removes this prepending redundancy and extends the representable range of shorter VLQs by adding an offset to VLQs of 2 or more octets in such a way that the lowest possible value for such an (N+1)-octet VLQ becomes exactly one more than the maximum possible value for an N-octet VLQ. In particular, since a 1-octet VLQ can store a maximum value of 127, the minimum 2-octet VLQ (0x8000) is assigned the value 128 instead of 0. Conversely, the maximum value of such a 2-octet VLQ (0xff7f) is 16511 instead of just 16383. Similarly, the minimum 3-octet VLQ (0x808000) has a value of 16512 instead of zero, which means that the maximum 3-octet VLQ (0xffff7f) is 2113663 instead of just 2097151. And so forth.

This is how the offset is saved in C:

dheader[pos] = ofs & 127;
while (ofs >>= 7)
    dheader[--pos] = 128 | (--ofs & 127);

func (Parser) ReadNonDeltaObjectContent

func (p Parser) ReadNonDeltaObjectContent() ([]byte, error)

ReadNonDeltaObjectContent reads and returns a non-deltified object from it zlib stream in an object entry in the packfile.

func (Parser) ReadOFSDeltaObjectContent

func (p Parser) ReadOFSDeltaObjectContent(start int64) (
	[]byte, core.ObjectType, error)

ReadOFSDeltaObjectContent reads an returns an object specified by an OFS-delta entry in the packfile from it negative offset onwards. The start parameter is the offset of this particular object entry (the current offset minus the already processed type and length).

func (Parser) ReadObject

func (p Parser) ReadObject() (core.Object, error)

ReadObject reads and returns a git object from an object entry in the packfile. Non-deltified and deltified objects are supported.

func (Parser) ReadObjectTypeAndLength

func (p Parser) ReadObjectTypeAndLength() (core.ObjectType, int64, error)

ReadObjectTypeAndLength reads and returns the object type and the length field from an object entry in a packfile.

func (Parser) ReadREFDeltaObjectContent

func (p Parser) ReadREFDeltaObjectContent() ([]byte, core.ObjectType, error)

ReadREFDeltaObjectContent reads and returns an object specified by a REF-Delta entry in the packfile, form the hash onwards.

func (*Parser) ReadSignature

func (p *Parser) ReadSignature() ([]byte, error)

ReadSignature reads an returns the signature field in the packfile.

func (Parser) ReadSolveDelta

func (p Parser) ReadSolveDelta(base []byte) ([]byte, error)

ReadSolveDelta reads and returns the base patched with the contents of a zlib compressed diff data in the delta portion of an object entry in the packfile.

func (*Parser) ReadVersion

func (p *Parser) ReadVersion() (uint32, error)

ReadVersion reads and returns the version field of a packfile.

type ReadRecaller

type ReadRecaller interface {
	// Read reads up to len(p) bytes into p.
	Read(p []byte) (int, error)
	// ReadByte is needed because of these:
	// - https://github.com/golang/go/commit/7ba54d45732219af86bde9a5b73c145db82b70c6
	// - https://groups.google.com/forum/#!topic/golang-nuts/fWTRdHpt0QI
	// - https://gowalker.org/compress/zlib#NewReader
	ReadByte() (byte, error)
	// Offset returns the number of bytes parsed so far from the
	// packfile.
	Offset() (int64, error)
	// Remember ask the ReadRecaller to remember the offset and hash for
	// an object, so you can later call RecallByOffset and RecallByHash.
	Remember(int64, core.Object) error
	// ForgetAll forgets all previously remembered objects.
	ForgetAll()
	// RecallByOffset returns the previously processed object found at a
	// given offset.
	RecallByOffset(int64) (core.Object, error)
	// RecallByHash returns the previously processed object with the
	// given hash.
	RecallByHash(core.Hash) (core.Object, error)
}

The ReadRecaller interface has all the functions needed by a packfile Parser to operate. We provide two very different implementations: Seekable and Stream.

type Seekable

type Seekable struct {
	io.ReadSeeker
	HashToOffset map[core.Hash]int64
}

Seekable implements ReadRecaller for the io.ReadSeeker of a packfile. Remembering does not actually stores any reference to the remembered objects; the object offset is remebered instead and the packfile is read again everytime a recall operation is requested. This saves memory buy can be very slow if the associated io.ReadSeeker is slow (like a hard disk).

func NewSeekable

func NewSeekable(r io.ReadSeeker) *Seekable

NewSeekable returns a new Seekable that reads form r.

func (*Seekable) ForgetAll

func (r *Seekable) ForgetAll()

ForgetAll forgets all previously remembered objects. For efficiency reasons RecallByOffset always find objects, even if they have been forgetted or were never remembered.

func (*Seekable) Offset

func (r *Seekable) Offset() (int64, error)

Offset returns the offset for the next Read or ReadByte.

func (*Seekable) Read

func (r *Seekable) Read(p []byte) (int, error)

Read reads up to len(p) bytes into p.

func (*Seekable) ReadByte

func (r *Seekable) ReadByte() (byte, error)

ReadByte reads a byte.

func (*Seekable) RecallByHash

func (r *Seekable) RecallByHash(h core.Hash) (core.Object, error)

RecallByHash returns the object for a given hash by looking for it again in the io.ReadeSeerker.

func (*Seekable) RecallByOffset

func (r *Seekable) RecallByOffset(o int64) (obj core.Object, err error)

RecallByOffset returns the object for a given offset by looking for it again in the io.ReadeSeerker. For efficiency reasons, this method always find objects by offset, even if they have not been remembered or if they have been forgetted.

func (*Seekable) Remember

func (r *Seekable) Remember(o int64, obj core.Object) error

Remember stores the offset of the object and its hash, but not the object itself. This implementation does not check for already stored offsets, as it is too expensive to build this information from an index every time a get operation is performed on the SeekableReadRecaller.

type Stream

type Stream struct {
	io.Reader
	// contains filtered or unexported fields
}

Stream implements ReadRecaller for the io.Reader of a packfile. This implementation keeps all remembered objects referenced in maps for quick access.

func NewStream

func NewStream(r io.Reader) *Stream

NewStream returns a new Stream that reads form r.

func (*Stream) ForgetAll

func (r *Stream) ForgetAll()

ForgetAll forgets all previously remembered objects.

func (*Stream) Offset

func (r *Stream) Offset() (int64, error)

Offset returns the number of bytes read.

func (*Stream) Read

func (r *Stream) Read(p []byte) (n int, err error)

Read reads up to len(p) bytes into p.

func (*Stream) ReadByte

func (r *Stream) ReadByte() (byte, error)

ReadByte reads a byte.

func (*Stream) RecallByHash

func (r *Stream) RecallByHash(h core.Hash) (core.Object, error)

RecallByHash returns an object that has been previously Remember-ed by its hash.

func (*Stream) RecallByOffset

func (r *Stream) RecallByOffset(o int64) (core.Object, error)

RecallByOffset returns an object that has been previously Remember-ed by the offset of its object entry in the packfile.

func (*Stream) Remember

func (r *Stream) Remember(o int64, obj core.Object) error

Remember stores references to the passed object to be used later by RecalByHash and RecallByOffset. It receives the object and the offset of its object entry in the packfile.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL