Documentation
¶
Index ¶
Constants ¶
const (
MinFileSize = 512 // Minimum file size for a Sdbf file.
)
Variables ¶
var ( BfSize uint32 = 256 // BfSize is the size of each bloom filters PopWinSize uint32 = 64 // PopWinSize is the size of the sliding window used to hash input. MaxElem uint32 = 160 // MaxElem is maximum number of elements in each bloom filter in stream mode. MaxElemDd uint32 = 192 // MaxElem is maximum number of elements in each bloom filter in block mode. Threshold uint32 = 16 // Threshold is the minimum value of the score above witch chunks are considered. BlockSize = 4 * kB // BlockSize is the block size used to generate chunk ranks. EntropyWinSize = 64 // EntropyWinSize is the entropy window size used to generate chunk ranks. )
Functions ¶
This section is empty.
Types ¶
type BloomFilter ¶
type BloomFilter interface { // ElemCount returns the number of elements in the BloomFilter. ElemCount() uint64 // MaxElem returns the maximum number of elements that can be present in the BloomFilter. MaxElem() uint64 // BitsPerElem returns the number of bits for each elements of the BloomFilter. BitsPerElem() float64 // WriteToFile serialize the current BloomFilter to a file specified by filename. WriteToFile(filename string) error // String returns the serialized representation of the BloomFilter. String() string // contains filtered or unexported methods }
BloomFilter represent a bloom filter and it is used to calculate similarity digests.
func NewBloomFilter ¶
func NewBloomFilter() BloomFilter
NewBloomFilter returns a new BloomFilter with the default initial values.
func NewBloomFilterFromIndexFile ¶
func NewBloomFilterFromIndexFile(indexFileName string) (BloomFilter, error)
NewBloomFilterFromIndexFile read a BloomFilter serialized into a file.
func NewBloomFilterFromString ¶
func NewBloomFilterFromString(filter string) (BloomFilter, error)
NewBloomFilterFromString create a new BloomFilter from a serialized string.
type Sdbf ¶
type Sdbf interface { // Name of the of the file or data this Sdbf represents. Name() string // Size of the hash data for this Sdbf. Size() uint64 // InputSize of the data that the hash was generated from. InputSize() uint64 // FilterCount returns the number of bloom filters count. FilterCount() uint32 // Compare two Sdbf and provide a similarity score ranges between 0 and 100. // A score of 0 means that the two files are very different, a score of 100 means that the two files are equals. Compare(other Sdbf) int // CompareSample compare two Sdbf with sampling and provide a similarity score ranges between 0 and 100. // A score of 0 means that the two files are very different, a score of 100 means that the two files are equals. CompareSample(other Sdbf, sample uint32) int // String returns the encoded Sdbf as a string. String() string // GetIndex returns the BloomFilter index used during the digesting process. GetIndex() BloomFilter // GetSearchIndexesResults returns search indexes results. // The return value is an array of size == len(searchIndexes), and each elements has another array of length bfCount. GetSearchIndexesResults() [][]uint32 // Fast modify the bloom filter buffer for faster comparison. // Warning: the operation overwrite the original buffer. Fast() }
Sdbf represent the similarity digest of a file and can be compared for similarity to others Sdbf.
func ParseSdbfFromString ¶
ParseSdbfFromString decode a Sdbf from a digest string.
type SdbfFactory ¶
type SdbfFactory interface { // WithBlockSize sets the block size for the block mode. // The default value of 0 involves in a Sdbf generated in stream mode. WithBlockSize(blockSize uint32) SdbfFactory // WithInitialIndex sets the initial BloomFilter index. // Without setting an initial index the factory creates a new empty BloomFilter. WithInitialIndex(initialIndex BloomFilter) SdbfFactory // WithSearchIndexes sets a list of BloomFilter which are checked for similarity during digesting process. // Without setting a value the searching operation during the digesting process is disabled. WithSearchIndexes(searchIndexes []BloomFilter) SdbfFactory // WithName sets the name of the Sdbf in the output. WithName(name string) SdbfFactory // Compute start the digesting process and provide a Sdbf with the result. Compute() Sdbf }
SdbfFactory can be used to create a Sdbf from a binary source.
func CreateSdbfFromBytes ¶
func CreateSdbfFromBytes(buffer []uint8) (SdbfFactory, error)
CreateSdbfFromBytes returns a factory which can produce a Sdbf from a bytes buffer.
func CreateSdbfFromFilename ¶
func CreateSdbfFromFilename(filename string) (SdbfFactory, error)
CreateSdbfFromFilename returns a factory which can produce a Sdbf of a file.
func CreateSdbfFromReader ¶
func CreateSdbfFromReader(r io.Reader) (SdbfFactory, error)
CreateSdbfFromReader returns a factory which can produce a Sdbf from a io.Reader.