bloom

package
v0.0.0-...-5d42db8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 12, 2023 License: Apache-2.0 Imports: 5 Imported by: 1

Documentation

Overview

Package bloom implements parquet bloom filters.

Index

Constants

View Source
const (
	// BlockSize is the size of bloom filter blocks in bytes.
	BlockSize = 32
)

Variables

This section is empty.

Functions

func CheckSplitBlock

func CheckSplitBlock(r io.ReaderAt, n int64, x uint64) (bool, error)

CheckSplitBlock is similar to bloom.SplitBlockFilter.Check but reads the bloom filter of n bytes from r.

The size n of the bloom filter is assumed to be a multiple of the block size.

func NumSplitBlocksOf

func NumSplitBlocksOf(numValues int64, bitsPerValue uint) int

NumSplitBlocksOf returns the number of blocks in a filter intended to hold the given number of values and bits of filter per value.

This function is useful to determine the number of blocks when creating bloom filters in memory, for example:

f := make(bloom.SplitBlockFilter, bloom.NumSplitBlocksOf(n, 10))

Types

type Block

type Block [8]Word

Block represents bloom filter blocks which contain eight 32 bits words.

func (*Block) Bytes

func (b *Block) Bytes() []byte

Bytes returns b as a byte slice.

func (*Block) Check

func (b *Block) Check(x uint32) bool

func (*Block) Insert

func (b *Block) Insert(x uint32)

type Filter

type Filter interface {
	Check(uint64) bool
}

Filter is an interface representing read-only bloom filters where programs can probe for the possible presence of a hash key.

type Hash

type Hash interface {
	// Returns the 64 bit hash of the value passed as argument.
	Sum64(value []byte) uint64

	// Compute hashes of individual values of primitive types.
	Sum64Uint8(value uint8) uint64
	Sum64Uint16(value uint16) uint64
	Sum64Uint32(value uint32) uint64
	Sum64Uint64(value uint64) uint64
	Sum64Uint128(value [16]byte) uint64

	// Compute hashes of the array of fixed size values passed as arguments,
	// returning the number of hashes written to the destination buffer.
	MultiSum64Uint8(dst []uint64, src []uint8) int
	MultiSum64Uint16(dst []uint64, src []uint16) int
	MultiSum64Uint32(dst []uint64, src []uint32) int
	MultiSum64Uint64(dst []uint64, src []uint64) int
	MultiSum64Uint128(dst []uint64, src [][16]byte) int
}

Hash is an interface abstracting the hashing algorithm used in bloom filters.

Hash instances must be safe to use concurrently from multiple goroutines.

type SplitBlockFilter

type SplitBlockFilter []Block

SplitBlockFilter is an in-memory implementation of the parquet bloom filters.

This type is useful to construct bloom filters that are later serialized to a storage medium.

func MakeSplitBlockFilter

func MakeSplitBlockFilter(data []byte) SplitBlockFilter

MakeSplitBlockFilter constructs a SplitBlockFilter value from the data byte slice.

func (SplitBlockFilter) Block

func (f SplitBlockFilter) Block(x uint64) *Block

Block returns a pointer to the block that the given value hashes to in the bloom filter.

func (SplitBlockFilter) Bytes

func (f SplitBlockFilter) Bytes() []byte

Bytes converts f to a byte slice.

The returned slice shares the memory of f. The method is intended to be used to serialize the bloom filter to a storage medium.

func (SplitBlockFilter) Check

func (f SplitBlockFilter) Check(x uint64) bool

Check tests whether x is in f.

func (SplitBlockFilter) Insert

func (f SplitBlockFilter) Insert(x uint64)

Insert adds x to f.

func (SplitBlockFilter) InsertBulk

func (f SplitBlockFilter) InsertBulk(x []uint64)

InsertBulk adds all values from x into f.

func (SplitBlockFilter) Reset

func (f SplitBlockFilter) Reset()

Reset clears the content of the filter f.

type Word

type Word uint32

Word represents 32 bits words of bloom filter blocks.

type XXH64

type XXH64 struct{}

XXH64 is an implementation of the Hash interface using the XXH64 algorithm.

func (XXH64) MultiSum64Uint128

func (XXH64) MultiSum64Uint128(h []uint64, v [][16]byte) int

func (XXH64) MultiSum64Uint16

func (XXH64) MultiSum64Uint16(h []uint64, v []uint16) int

func (XXH64) MultiSum64Uint32

func (XXH64) MultiSum64Uint32(h []uint64, v []uint32) int

func (XXH64) MultiSum64Uint64

func (XXH64) MultiSum64Uint64(h []uint64, v []uint64) int

func (XXH64) MultiSum64Uint8

func (XXH64) MultiSum64Uint8(h []uint64, v []uint8) int

func (XXH64) Sum64

func (XXH64) Sum64(b []byte) uint64

func (XXH64) Sum64Uint128

func (XXH64) Sum64Uint128(v [16]byte) uint64

func (XXH64) Sum64Uint16

func (XXH64) Sum64Uint16(v uint16) uint64

func (XXH64) Sum64Uint32

func (XXH64) Sum64Uint32(v uint32) uint64

func (XXH64) Sum64Uint64

func (XXH64) Sum64Uint64(v uint64) uint64

func (XXH64) Sum64Uint8

func (XXH64) Sum64Uint8(v uint8) uint64

Directories

Path Synopsis
Package xxhash is an extension of github.com/cespare/xxhash which adds routines optimized to hash arrays of fixed size elements.
Package xxhash is an extension of github.com/cespare/xxhash which adds routines optimized to hash arrays of fixed size elements.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL