Documentation ¶
Overview ¶
Package rabin implements Rabin hashing (fingerprinting).
A given Rabin hash function is defined by a polynomial over GF(2):
p(x) = ... + p₂x² + p₁x + p₀ where pₙ ∈ GF(2)
The message to be hashed is likewise interpreted as a polynomial over GF(2), where the coefficients are the bits of the message in left-to-right most-significant-bit-first order. Given a message polynomial m(x) and a hashing polynomial p(x), the Rabin hash is simply the coefficients of m(x) mod p(x).
Rabin hashing has the unusual property that it can efficiently compute a "rolling hash" of a stream of data, where the hash value reflects only the most recent w bytes of the stream, for some window size w. This property makes it ideal for "content-defined chunking", which sub-divides sequential data on boundaries that are robust to insertions and deletions.
The details of Rabin hashing are described in Rabin, Michael (1981). "Fingerprinting by Random Polynomials." Center for Research in Computing Technology, Harvard University. Tech Report TR-CSE-03-01.
Index ¶
Constants ¶
const Poly64 = 0xbfe6b8a5bf378d83
Poly64 is an 64-bit (degree 63) irreducible polynomial over GF(2).
This is a convenient polynomial to use for computing 64-bit Rabin hashes.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunker ¶
type Chunker struct {
// contains filtered or unexported fields
}
A Chunker performs content-defined chunking. It divides a sequence of bytes into chunks such that insertions and deletions in the sequence will only affect chunk boundaries near those modifications.
func NewChunker ¶
NewChunker returns a content-defined chunker for data read from r using the Rabin hash defined by table. The chunks produced by this Chunker will be at least minBytes and at most maxBytes large and will, on average, be avgBytes large.
The Chunker buffers data from the Reader internally, so the Reader need not buffer itself. The caller may seek the reader, but if it does, it must only seek to a known chunk boundary and it must call Reset on the Chunker.
If the Reader additionally implements Discarder, the Chunker will use this to skip over bytes more efficiently.
The hash function defined by table must have a non-zero window size.
minBytes must be >= the window size. This ensures that chunk boundary n+1 does not depend on data from before chunk boundary n.
avgBytes must be a power of two.
func (*Chunker) Next ¶
Next returns the length in bytes of the next chunk. If there are no more chunks, it returns 0, io.EOF. If the underlying Reader returns some other error, it passes that error on to the caller.
func (*Chunker) Reset ¶
func (c *Chunker) Reset()
Reset resets c and clears its internal buffer. The caller must ensure that the underlying Reader is at a chunk boundary when calling Reset.
This is useful if the caller has knowledge of where an already-chunked stream is being modified. It can start at the chunk boundary before the modified point and re-chunk the stream until a new chunk boundary lines up with a boundary in the previous version of the stream.
type Discarder ¶
type Discarder interface { // Discard skips the next n bytes, returning the number of // bytes discarded. // // If Discard skips fewer than n bytes, it also returns an // error. Discard must not skip beyond the end of the file. Discard(n int) (discarded int, err error) }
A Discarder supports discarding bytes from an input stream.
type Hash ¶
type Hash struct {
// contains filtered or unexported fields
}
Hash computes Rabin hashes (often called fingerprints).
Hash implements hash.Hash64.
func (*Hash) BlockSize ¶
BlockSize returns the window size if a window is configured, and otherwise returns 1.
This satisfies the hash.Hash interface and indicates that Write is most efficient if writes are a multiple of the returned size.
func (*Hash) Size ¶
Size returns the number of bytes Sum will append. This is the minimum number of bytes necessary to represent the hash.
func (*Hash) Sum ¶
Sum appends the least-significant byte first representation of the current hash to b and returns the resulting slice.
type Table ¶
type Table struct {
// contains filtered or unexported fields
}
Table is a set of pre-computed tables for computing Rabin fingerprints for a given polynomial and window size.
func NewTable ¶
NewTable returns a Table for constructing Rabin hashes using the polynomial
p(x) = ... + p₂x² + p₁x + p₀ where pₙ ∈ GF(2)
where pₙ = (polynomial >> n) & 1. This polynomial must be irreducible and must have degree >= 8. The number of bits in the resulting hash values will be the same as the number of bits in polynomial.
This package defines Poly64 as a convenient 64-bit irreducible polynomial that can be used with this function.
If window > 0, hashes constructed from this Table will be rolling hash over only the most recently written window bytes of data.