chunker

package

v0.0.0-...-ba1c585 Latest Latest Go to latest Published: Jun 29, 2017 License: BSD-3-Clause Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vanadium/go.ref

Links

Open Source Insights

Documentation ¶

Overview ¶

Package chunker breaks a stream of bytes into context-defined chunks whose boundaries are chosen based on content checksums of a window that slides over the data. An edited sequence with insertions and removals can share many chunks with the original sequence.

The intent is that when a sequence of bytes is to be transmitted to a recipient that may have much of the data, the sequence can be broken down into chunks. The checksums of the resulting chunks may then be transmitted to the recipient, which can then discover which of the chunks it has, and which it needs.

Example:

     var s *chunker.Stream = chunker.New(&chunker.DefaultParam, anIOReader)
     for s.Advance() {
		var chunk []byte := s.Value()
             // process chunk
	}
	if s.Err() != nil {
		// anIOReader generated an error.
	}

Index ¶

type Param
type PosStream
- func NewPosStream(ctx *context.T, param *Param, rd io.Reader) *PosStream
type Stream
- func NewStream(ctx *context.T, param *Param, rd io.Reader) *Stream

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Param ¶

type Param struct {
	WindowWidth int    // the window size to use when looking for chunk boundaries
	MinChunk    int64  // minimum chunk size
	MaxChunk    int64  // maximum chunk size
	Primary     uint64 // primary divisor; the expected chunk size
	Secondary   uint64 // secondary divisor
}

A Param contains the parameters for chunking.

Chunks are broken based on a hash of a sliding window of width WindowWidth bytes. Each chunk is at most MaxChunk bytes long, and, unless end-of-file or an error is reached, at least MinChunk bytes long.

Subject to those constaints, a chunk boundary introduced at the first point where the hash of the sliding window is 1 mod Primary, or if that doesn't occur before MaxChunk bytes, at the last position where the hash is 1 mod Secondary, or if that does not occur, after MaxChunk bytes. Normally, MinChunk < Primary < MaxChunk. Primary is the expected chunk size. The Secondary divisor exists to make it more likely that a chunk boundary is selected based on the local data when the Primary divisor by chance does not find a match for a long distance. It should be a few times smaller than Primary.

Using primes for Primary and Secondary is not essential, but recommended because it guarantees mixing of the checksum bits should their distribution be non-uniform.

var DefaultParam Param = Param{WindowWidth: 48, MinChunk: 512, MaxChunk: 3072, Primary: 601, Secondary: 307}

DefaultParam contains default chunking parameters.

type PosStream ¶

type PosStream struct {
	// contains filtered or unexported fields
}

A PosStream is just like a Stream, except that the Value() method returns only the byte offsets of the ends of chunks, rather than the chunks themselves. It can be used when chunks are too large to buffer a small number comfortably in memory.

func NewPosStream ¶

func NewPosStream(ctx *context.T, param *Param, rd io.Reader) *PosStream

NewPosStream() returns a pointer to a new PosStream instance, with the parameters in *param.

func (*PosStream) Advance ¶

func (ps *PosStream) Advance() bool

Advance() stages the offset of the end of the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.

func (*PosStream) Cancel ¶

func (ps *PosStream) Cancel()

Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on ps.

func (*PosStream) Err ¶

func (ps *PosStream) Err() error

Err() returns any error encountered by Advance(). Never blocks.

func (*PosStream) Value ¶

func (ps *PosStream) Value() int64

Value() returns the chunk that was staged by Advance(). May panic if Advance() returned false or was not called. Never blocks.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

A Stream allows a client to iterate over the chunks within an io.Reader byte stream.

func NewStream ¶

func NewStream(ctx *context.T, param *Param, rd io.Reader) *Stream

NewStream() returns a pointer to a new Stream instance, with the parameters in *param.

func (*Stream) Advance ¶

func (s *Stream) Advance() bool

Advance() stages the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.

func (*Stream) Cancel ¶

func (s *Stream) Cancel()

Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on s.

func (*Stream) Err ¶

func (s *Stream) Err() (err error)

Err() returns any error encountered by Advance(). Never blocks.

func (*Stream) Value ¶

func (s *Stream) Value() []byte

Value() returns the chunk that was staged by Advance(). May panic if Advance() returned false or was not called. Never blocks.

Source Files ¶

View all Source files

chunker.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL