Documentation ¶
Index ¶
- Variables
- type CDC_Config
- func (z *CDC_Config) DecodeMsg(dc *msgp.Reader) (err error)
- func (z CDC_Config) EncodeMsg(en *msgp.Writer) (err error)
- func (z CDC_Config) MarshalMsg(b []byte) (o []byte, err error)
- func (z CDC_Config) Msgsize() (s int)
- func (z *CDC_Config) UnmarshalMsg(bts []byte) (o []byte, err error)
- func (z *CDC_Config) UnmarshalMsgWithCfg(bts []byte, cfg *msgp.RuntimeConfig) (o []byte, err error)
- type Cutpointer
- type FastCDC
- type UltraCDC
Constants ¶
This section is empty.
Variables ¶
var ErrMaxSize = errors.New("MaxSize is required and must be 64B <= MaxSize <= 1GB && MaxSize > TargetSize")
var ErrMinSize = errors.New("MinSize is required and must be 64B <= MinSize <= 1GB && MinSize < TargetSize")
var ErrTargetSize = errors.New("TargetSize is required and must be 64B <= TargetSize <= 1GB")
Functions ¶
This section is empty.
Types ¶
type CDC_Config ¶
func Default_FastCDC_Options ¶
func Default_FastCDC_Options() *CDC_Config
func Default_UltraCDC_Options ¶
func Default_UltraCDC_Options() *CDC_Config
users frequently modify what they get back here, so give each caller their own copy.
func (*CDC_Config) DecodeMsg ¶
func (z *CDC_Config) DecodeMsg(dc *msgp.Reader) (err error)
DecodeMsg implements msgp.Decodable We treat empty fields as if we read a Nil from the wire.
func (CDC_Config) EncodeMsg ¶
func (z CDC_Config) EncodeMsg(en *msgp.Writer) (err error)
EncodeMsg implements msgp.Encodable
func (CDC_Config) MarshalMsg ¶
func (z CDC_Config) MarshalMsg(b []byte) (o []byte, err error)
MarshalMsg implements msgp.Marshaler
func (CDC_Config) Msgsize ¶
func (z CDC_Config) Msgsize() (s int)
Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message
func (*CDC_Config) UnmarshalMsg ¶
func (z *CDC_Config) UnmarshalMsg(bts []byte) (o []byte, err error)
UnmarshalMsg implements msgp.Unmarshaler
func (*CDC_Config) UnmarshalMsgWithCfg ¶
func (z *CDC_Config) UnmarshalMsgWithCfg(bts []byte, cfg *msgp.RuntimeConfig) (o []byte, err error)
type Cutpointer ¶
type Cutpointer interface { Cutpoints(data []byte, maxPoints int) (cuts []int) Name() string Config() *CDC_Config }
type FastCDC ¶
type FastCDC struct {
Opts *CDC_Config `zid:"0"`
}
func NewFastCDC ¶
func NewFastCDC(opts *CDC_Config) *FastCDC
func (*FastCDC) Algorithm ¶
func (c *FastCDC) Algorithm(options *CDC_Config, data []byte, N int) (cutpoint int)
Modified FastCDC algorithm: not the same as the original paper! We use a unint64 for the hash, so it is 64-bits wide. The gear table is the 64-bit version that provides slightly better dedup. references: [0] https://github.com/google/cdc-file-transfer/blob/main/fastcdc/fastcdc.h [1] https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf. [2] https://github.com/dbaarda/rollsum-chunking/blob/master/RESULTS.rst [3] https://www.usenix.org/system/files/conference/atc12/atc12-final293.pdf
Notes on the API:
Algorithm's return value, cutpoint, might typically be used next in segment := data[:cutpoint], so we expect to exclude the cutpoint index value itself. Also commonly when n == len(data) and data is short, then the returned cutpoint will be n; n is the default to return when we did not find a shorter cutpoint. The segment := data[:len(data)] will then take all of data as the segment to hash.
PRE condition: n must be <= len(data). We will panic if this does not hold. It is always safe to pass n = len(data).
POST INVARIANT: cutpoint <= n. We never return a cutpoint > n.
func (*FastCDC) Config ¶
func (c *FastCDC) Config() *CDC_Config
func (*FastCDC) Cutpoints ¶
Cutpoints computes all the cutpoints we can in a batch, all at once, if maxPoints <= 0; otherwise only up to a maximum of maxPoints. We may find fewer, of course. There will always be one, as len(data) is returned in cuts if no sooner cutpoint is found. If maxPoints <= 0 then the last cutpoint in cuts will always be len(data).
func (*FastCDC) Validate ¶
func (c *FastCDC) Validate(options *CDC_Config) error
type UltraCDC ¶
type UltraCDC struct {
Opts *CDC_Config `zid:"0"`
}
func NewUltraCDC ¶
func NewUltraCDC(opts *CDC_Config) *UltraCDC
NewUltraCDC is for non-Plakar standalone clients. Plakar clients will use newUltraCDC via the chunkers.NewChunker("ultracdc", ...) factory.
func (*UltraCDC) Algorithm ¶
func (c *UltraCDC) Algorithm(options *CDC_Config, data []byte, n int) (cutpoint int)
Algorithm's return value, cutpoint, might typically be used next in segment := data[:cutpoint], so we expect to exclude the cutpoint index value itself. Also commonly when n == len(data) and data is short, then the returned cutpoint will be n; n is the default to return when we did not find a shorter cutpoint. The segment := data[:len(data)] will then take all of data as the segment to hash.
PRE condition: n must be <= len(data). We will panic if this does not hold. It is always safe to pass n = len(data).
POST INVARIANT: cutpoint <= n. We never return a cutpoint > n.
func (*UltraCDC) Config ¶
func (c *UltraCDC) Config() *CDC_Config
func (*UltraCDC) Cutpoints ¶
Cutpoints computes all the cutpoints we can in a batch, all at once, if maxPoints <= 0; otherwise only up to a maximum of maxPoints. We may find fewer, of course. There will always be one, as len(data) is returned in cuts if no sooner cutpoint is found. If maxPoints <= 0 then the last cutpoint in cuts will always be len(data).
func (*UltraCDC) Validate ¶
func (c *UltraCDC) Validate(options *CDC_Config) error