Documentation ¶
Overview ¶
Package chunker breaks a stream of bytes into context-defined chunks whose boundaries are chosen based on content checksums of a window that slides over the data. An edited sequence with insertions and removals can share many chunks with the original sequence.
The intent is that when a sequence of bytes is to be transmitted to a recipient that may have much of the data, the sequence can be broken down into chunks. The checksums of the resulting chunks may then be transmitted to the recipient, which can then discover which of the chunks it has, and which it needs.
Example:
var s *chunker.Stream = chunker.New(&chunker.DefaultParam, anIOReader) for s.Advance() { var chunk []byte := s.Value() // process chunk } if s.Err() != nil { // anIOReader generated an error. }
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Param ¶
type Param struct { WindowWidth int // the window size to use when looking for chunk boundaries MinChunk int64 // minimum chunk size MaxChunk int64 // maximum chunk size Primary uint64 // primary divisor; the expected chunk size Secondary uint64 // secondary divisor }
A Param contains the parameters for chunking.
Chunks are broken based on a hash of a sliding window of width WindowWidth bytes. Each chunk is at most MaxChunk bytes long, and, unless end-of-file or an error is reached, at least MinChunk bytes long.
Subject to those constaints, a chunk boundary introduced at the first point where the hash of the sliding window is 1 mod Primary, or if that doesn't occur before MaxChunk bytes, at the last position where the hash is 1 mod Secondary, or if that does not occur, after MaxChunk bytes. Normally, MinChunk < Primary < MaxChunk. Primary is the expected chunk size. The Secondary divisor exists to make it more likely that a chunk boundary is selected based on the local data when the Primary divisor by chance does not find a match for a long distance. It should be a few times smaller than Primary.
Using primes for Primary and Secondary is not essential, but recommended because it guarantees mixing of the checksum bits should their distribution be non-uniform.
type PosStream ¶
type PosStream struct {
// contains filtered or unexported fields
}
A PosStream is just like a Stream, except that the Value() method returns only the byte offsets of the ends of chunks, rather than the chunks themselves. It can be used when chunks are too large to buffer a small number comfortably in memory.
func NewPosStream ¶
NewPosStream() returns a pointer to a new PosStream instance, with the parameters in *param.
func (*PosStream) Advance ¶
Advance() stages the offset of the end of the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.
func (*PosStream) Cancel ¶
func (ps *PosStream) Cancel()
Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on ps.
type Stream ¶
type Stream struct {
// contains filtered or unexported fields
}
A Stream allows a client to iterate over the chunks within an io.Reader byte stream.
func NewStream ¶
NewStream() returns a pointer to a new Stream instance, with the parameters in *param.
func (*Stream) Advance ¶
Advance() stages the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.
func (*Stream) Cancel ¶
func (s *Stream) Cancel()
Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on s.