jcdc

package
v1.6.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2025 License: BSD-3-Clause, ISC Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrMaxSize = errors.New("MaxSize is required and must be 64B <= MaxSize <= 1GB && MaxSize > TargetSize")
View Source
var ErrMinSize = errors.New("MinSize is required and must be 64B <= MinSize <= 1GB && MinSize < TargetSize")
View Source
var ErrTargetSize = errors.New("TargetSize is required and must be 64B <= TargetSize <= 1GB")

Functions

This section is empty.

Types

type CDC_Config

type CDC_Config struct {
	MinSize    int `zid:"0"`
	TargetSize int `zid:"1"`
	MaxSize    int `zid:"2"`
}

func Default_FastCDC_Options

func Default_FastCDC_Options() *CDC_Config

func Default_UltraCDC_Options

func Default_UltraCDC_Options() *CDC_Config

users frequently modify what they get back here, so give each caller their own copy.

func (*CDC_Config) DecodeMsg

func (z *CDC_Config) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable We treat empty fields as if we read a Nil from the wire.

func (CDC_Config) EncodeMsg

func (z CDC_Config) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (CDC_Config) MarshalMsg

func (z CDC_Config) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (CDC_Config) Msgsize

func (z CDC_Config) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*CDC_Config) UnmarshalMsg

func (z *CDC_Config) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

func (*CDC_Config) UnmarshalMsgWithCfg

func (z *CDC_Config) UnmarshalMsgWithCfg(bts []byte, cfg *msgp.RuntimeConfig) (o []byte, err error)

type Cutpointer

type Cutpointer interface {
	Cutpoints(data []byte, maxPoints int) (cuts []int)
	Name() string
	Config() *CDC_Config
}

type FastCDC

type FastCDC struct {
	Opts *CDC_Config `zid:"0"`
}

func NewFastCDC

func NewFastCDC(opts *CDC_Config) *FastCDC

func (*FastCDC) Algorithm

func (c *FastCDC) Algorithm(options *CDC_Config, data []byte, N int) (cutpoint int)

Modified FastCDC algorithm: not the same as the original paper! We use a unint64 for the hash, so it is 64-bits wide. The gear table is the 64-bit version that provides slightly better dedup. references: [0] https://github.com/google/cdc-file-transfer/blob/main/fastcdc/fastcdc.h [1] https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf. [2] https://github.com/dbaarda/rollsum-chunking/blob/master/RESULTS.rst [3] https://www.usenix.org/system/files/conference/atc12/atc12-final293.pdf

Notes on the API:

Algorithm's return value, cutpoint, might typically be used next in segment := data[:cutpoint], so we expect to exclude the cutpoint index value itself. Also commonly when n == len(data) and data is short, then the returned cutpoint will be n; n is the default to return when we did not find a shorter cutpoint. The segment := data[:len(data)] will then take all of data as the segment to hash.

PRE condition: n must be <= len(data). We will panic if this does not hold. It is always safe to pass n = len(data).

POST INVARIANT: cutpoint <= n. We never return a cutpoint > n.

func (*FastCDC) Config

func (c *FastCDC) Config() *CDC_Config

func (*FastCDC) Cutpoints

func (c *FastCDC) Cutpoints(data []byte, maxPoints int) (cuts []int)

Cutpoints computes all the cutpoints we can in a batch, all at once, if maxPoints <= 0; otherwise only up to a maximum of maxPoints. We may find fewer, of course. There will always be one, as len(data) is returned in cuts if no sooner cutpoint is found. If maxPoints <= 0 then the last cutpoint in cuts will always be len(data).

func (*FastCDC) Name

func (c *FastCDC) Name() string

func (*FastCDC) Validate

func (c *FastCDC) Validate(options *CDC_Config) error

type UltraCDC

type UltraCDC struct {
	Opts *CDC_Config `zid:"0"`
}

func NewUltraCDC

func NewUltraCDC(opts *CDC_Config) *UltraCDC

NewUltraCDC is for non-Plakar standalone clients. Plakar clients will use newUltraCDC via the chunkers.NewChunker("ultracdc", ...) factory.

func (*UltraCDC) Algorithm

func (c *UltraCDC) Algorithm(options *CDC_Config, data []byte, n int) (cutpoint int)

Algorithm's return value, cutpoint, might typically be used next in segment := data[:cutpoint], so we expect to exclude the cutpoint index value itself. Also commonly when n == len(data) and data is short, then the returned cutpoint will be n; n is the default to return when we did not find a shorter cutpoint. The segment := data[:len(data)] will then take all of data as the segment to hash.

PRE condition: n must be <= len(data). We will panic if this does not hold. It is always safe to pass n = len(data).

POST INVARIANT: cutpoint <= n. We never return a cutpoint > n.

func (*UltraCDC) Config

func (c *UltraCDC) Config() *CDC_Config

func (*UltraCDC) Cutpoints

func (c *UltraCDC) Cutpoints(data []byte, maxPoints int) (cuts []int)

Cutpoints computes all the cutpoints we can in a batch, all at once, if maxPoints <= 0; otherwise only up to a maximum of maxPoints. We may find fewer, of course. There will always be one, as len(data) is returned in cuts if no sooner cutpoint is found. If maxPoints <= 0 then the last cutpoint in cuts will always be len(data).

func (*UltraCDC) Name

func (c *UltraCDC) Name() string

func (*UltraCDC) Validate

func (c *UltraCDC) Validate(options *CDC_Config) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL