Documentation ¶
Overview ¶
Package dedup implements a duplication-reducing reader for streams of length-delimited byte records. Each record is read as a varint-encoded length in bytes, followed immediately by the record itself.
A stream consists of a sequence of such records packed consecutively without additional padding. There are no checksums or compression. See also: kythe.io/kythe/go/platform/delimited.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader implements the Reader interface. Duplicate records are removed by hashing each and checking against a set of known record hashes. This is a quick-and-dirty method of removing duplicates; it will not be perfect.
func NewReader ¶
NewReader returns a reader that consumes records from r, using a cache of up to maxSize bytes for known record hashes.
func (*Reader) Next ¶
Next returns the next length-delimited record from the input, or io.EOF if there are no more records available. Returns io.ErrUnexpectedEOF if a short record is found, with a length of n but fewer than n bytes of data. Because there is no resynchronization mechanism, it is generally not possible to recover from a short record in this format.
The slice returned is valid only until a subsequent call to Next.