Documentation ¶
Overview ¶
Package roaring is an implementation of Roaring Bitmaps in Go. They provide fast compressed bitmap data structures (also called bitset). They are ideally suited to represent sets of integers over relatively small ranges. See http://roaringbitmap.org for details.
Index ¶
- Constants
- Variables
- func BoundSerializedSizeInBytes(cardinality uint64, universeSize uint64) uint64
- type Bitmap
- func AddOffset(x *Bitmap, offset uint32) (answer *Bitmap)
- func AddOffset64(x *Bitmap, offset int64) (answer *Bitmap)
- func And(x1, x2 *Bitmap) *Bitmap
- func AndNot(x1, x2 *Bitmap) *Bitmap
- func BitmapOf(dat ...uint32) *Bitmap
- func FastAnd(bitmaps ...*Bitmap) *Bitmap
- func FastOr(bitmaps ...*Bitmap) *Bitmap
- func Flip(bm *Bitmap, rangeStart, rangeEnd uint64) *Bitmap
- func FlipInt(bm *Bitmap, rangeStart, rangeEnd int) *Bitmap
- func HeapOr(bitmaps ...*Bitmap) *Bitmap
- func HeapXor(bitmaps ...*Bitmap) *Bitmap
- func New() *Bitmap
- func NewBitmap() *Bitmap
- func Or(x1, x2 *Bitmap) *Bitmap
- func ParAnd(parallelism int, bitmaps ...*Bitmap) *Bitmap
- func ParHeapOr(parallelism int, bitmaps ...*Bitmap) *Bitmap
- func ParOr(parallelism int, bitmaps ...*Bitmap) *Bitmap
- func Xor(x1, x2 *Bitmap) *Bitmap
- func (rb *Bitmap) Add(x uint32)
- func (rb *Bitmap) AddInt(x int)
- func (rb *Bitmap) AddMany(dat []uint32)
- func (rb *Bitmap) AddRange(rangeStart, rangeEnd uint64)
- func (rb *Bitmap) And(x2 *Bitmap)
- func (x1 *Bitmap) AndAny(bitmaps ...*Bitmap)
- func (rb *Bitmap) AndCardinality(x2 *Bitmap) uint64
- func (rb *Bitmap) AndNot(x2 *Bitmap)
- func (rb *Bitmap) CheckedAdd(x uint32) bool
- func (rb *Bitmap) CheckedRemove(x uint32) bool
- func (rb *Bitmap) Checksum() uint64
- func (rb *Bitmap) Clear()
- func (rb *Bitmap) Clone() *Bitmap
- func (rb *Bitmap) CloneCopyOnWriteContainers()
- func (rb *Bitmap) Contains(x uint32) bool
- func (rb *Bitmap) ContainsInt(x int) bool
- func (rb *Bitmap) Equals(o interface{}) bool
- func (rb *Bitmap) Flip(rangeStart, rangeEnd uint64)
- func (rb *Bitmap) FlipInt(rangeStart, rangeEnd int)
- func (bm *Bitmap) Freeze() ([]byte, error)
- func (bm *Bitmap) FreezeTo(buf []byte) (int, error)
- func (rb *Bitmap) FromBase64(str string) (int64, error)
- func (rb *Bitmap) FromBuffer(buf []byte) (p int64, err error)
- func (rb *Bitmap) FrozenView(buf []byte) error
- func (rb *Bitmap) GetCardinality() uint64
- func (rb *Bitmap) GetCopyOnWrite() (val bool)
- func (bm *Bitmap) GetFrozenSizeInBytes() uint64
- func (rb *Bitmap) GetSerializedSizeInBytes() uint64
- func (rb *Bitmap) GetSizeInBytes() uint64
- func (rb *Bitmap) HasRunCompression() bool
- func (rb *Bitmap) Intersects(x2 *Bitmap) bool
- func (rb *Bitmap) IntersectsWithInterval(x, y uint64) bool
- func (rb *Bitmap) IsEmpty() bool
- func (rb *Bitmap) Iterate(cb func(x uint32) bool)
- func (rb *Bitmap) Iterator() IntPeekable
- func (rb *Bitmap) ManyIterator() ManyIntIterable
- func (rb *Bitmap) MarshalBinary() ([]byte, error)
- func (rb *Bitmap) Maximum() uint32
- func (rb *Bitmap) Minimum() uint32
- func (rb *Bitmap) Or(x2 *Bitmap)
- func (rb *Bitmap) OrCardinality(x2 *Bitmap) uint64
- func (rb *Bitmap) Rank(x uint32) uint64
- func (rb *Bitmap) ReadFrom(reader io.Reader, cookieHeader ...byte) (p int64, err error)
- func (rb *Bitmap) Remove(x uint32)
- func (rb *Bitmap) RemoveRange(rangeStart, rangeEnd uint64)
- func (rb *Bitmap) ReverseIterator() IntIterable
- func (rb *Bitmap) RunOptimize()
- func (rb *Bitmap) Select(x uint32) (uint32, error)
- func (rb *Bitmap) SetCopyOnWrite(val bool)
- func (rb *Bitmap) Stats() Statistics
- func (rb *Bitmap) String() string
- func (rb *Bitmap) ToArray() []uint32
- func (rb *Bitmap) ToBase64() (string, error)
- func (rb *Bitmap) ToBytes() ([]byte, error)
- func (rb *Bitmap) UnmarshalBinary(data []byte) error
- func (bm *Bitmap) WriteFrozenTo(wr io.Writer) (int, error)
- func (rb *Bitmap) WriteTo(stream io.Writer) (int64, error)
- func (rb *Bitmap) Xor(x2 *Bitmap)
- type IntIterable
- type IntIterator
- type IntPeekable
- type IntReverseIterator
- type ManyIntIterable
- type ManyIntIterator
- type Statistics
Constants ¶
const ( // MaxUint32 is the largest uint32 value. MaxUint32 = math.MaxUint32 // MaxRange is One more than the maximum allowed bitmap bit index. For use as an upper // bound for ranges. MaxRange uint64 = MaxUint32 + 1 // MaxUint16 is the largest 16 bit unsigned int. // This is the largest value an interval16 can store. MaxUint16 = math.MaxUint16 )
const FROZEN_COOKIE = 13766
Verbatim specification from CRoaring. * * FROZEN SERIALIZATION FORMAT DESCRIPTION * * -- (beginning must be aligned by 32 bytes) -- * <bitset_data> uint64_t[BITSET_CONTAINER_SIZE_IN_WORDS * num_bitset_containers] * <run_data> rle16_t[total number of rle elements in all run containers] * <array_data> uint16_t[total number of array elements in all array containers] * <keys> uint16_t[num_containers] * <counts> uint16_t[num_containers] * <typecodes> uint8_t[num_containers] * <header> uint32_t * * <header> is a 4-byte value which is a bit union of FROZEN_COOKIE (15 bits) * and the number of containers (17 bits). * * <counts> stores number of elements for every container. * Its meaning depends on container type. * For array and bitset containers, this value is the container cardinality minus one. * For run container, it is the number of rle_t elements (n_runs). * * <bitset_data>,<array_data>,<run_data> are flat arrays of elements of * all containers of respective type. * * <*_data> and <keys> are kept close together because they are not accessed * during deserilization. This may reduce IO in case of large mmaped bitmaps. * All members have their native alignments during deserilization except <header>, * which is not guaranteed to be aligned by 4 bytes.
Variables ¶
var ( FrozenBitmapInvalidCookie = errors.New("header does not contain the FROZEN_COOKIE") FrozenBitmapBigEndian = errors.New("loading big endian frozen bitmaps is not supported") FrozenBitmapIncomplete = errors.New("input buffer too small to contain a frozen bitmap") FrozenBitmapOverpopulated = errors.New("too many containers") FrozenBitmapUnexpectedData = errors.New("spurious data in input") FrozenBitmapInvalidTypecode = errors.New("unrecognized typecode") FrozenBitmapBufferTooSmall = errors.New("buffer too small") )
Functions ¶
func BoundSerializedSizeInBytes ¶
BoundSerializedSizeInBytes returns an upper bound on the serialized size in bytes assuming that one wants to store "cardinality" integers in [0, universe_size)
Types ¶
type Bitmap ¶
type Bitmap struct {
// contains filtered or unexported fields
}
Bitmap represents a compressed bitmap where you can add integers.
func AddOffset ¶
AddOffset adds the value 'offset' to each and every value in a bitmap, generating a new bitmap in the process
func AddOffset64 ¶
AddOffset64 adds the value 'offset' to each and every value in a bitmap, generating a new bitmap in the process If offset + element is outside of the range [0,2^32), that the element will be dropped
func FastAnd ¶
FastAnd computes the intersection between many bitmaps quickly Compared to the And function, it can take many bitmaps as input, thus saving the trouble of manually calling "And" many times.
func FastOr ¶
FastOr computes the union between many bitmaps quickly, as opposed to having to call Or repeatedly. It might also be faster than calling Or repeatedly.
func Flip ¶
Flip negates the bits in the given range (i.e., [rangeStart,rangeEnd)), any integer present in this range and in the bitmap is removed, and any integer present in the range and not in the bitmap is added, a new bitmap is returned leaving the current bitmap unchanged. The function uses 64-bit parameters even though a Bitmap stores 32-bit values because it is allowed and meaningful to use [0,uint64(0x100000000)) as a range while uint64(0x100000000) cannot be represented as a 32-bit value.
func HeapOr ¶
HeapOr computes the union between many bitmaps quickly using a heap. It might be faster than calling Or repeatedly.
func HeapXor ¶
HeapXor computes the symmetric difference between many bitmaps quickly (as opposed to calling Xor repeated). Internally, this function uses a heap. It might be faster than calling Xor repeatedly.
func ParAnd ¶
ParAnd computes the intersection (AND) of all provided bitmaps in parallel, where the parameter "parallelism" determines how many workers are to be used (if it is set to 0, a default number of workers is chosen)
func ParHeapOr ¶
ParHeapOr computes the union (OR) of all provided bitmaps in parallel, where the parameter "parallelism" determines how many workers are to be used (if it is set to 0, a default number of workers is chosen) ParHeapOr uses a heap to compute the union. For rare cases it might be faster than ParOr
func ParOr ¶
ParOr computes the union (OR) of all provided bitmaps in parallel, where the parameter "parallelism" determines how many workers are to be used (if it is set to 0, a default number of workers is chosen)
func (*Bitmap) AddInt ¶
AddInt adds the integer x to the bitmap (convenience method: the parameter is casted to uint32 and we call Add)
func (*Bitmap) AddRange ¶
AddRange adds the integers in [rangeStart, rangeEnd) to the bitmap. The function uses 64-bit parameters even though a Bitmap stores 32-bit values because it is allowed and meaningful to use [0,uint64(0x100000000)) as a range while uint64(0x100000000) cannot be represented as a 32-bit value.
func (*Bitmap) And ¶
And computes the intersection between two bitmaps and stores the result in the current bitmap
func (*Bitmap) AndAny ¶
AndAny provides a result equivalent to x1.And(FastOr(bitmaps)). It's optimized to minimize allocations. It also might be faster than separate calls.
func (*Bitmap) AndCardinality ¶
AndCardinality returns the cardinality of the intersection between two bitmaps, bitmaps are not modified
func (*Bitmap) AndNot ¶
AndNot computes the difference between two bitmaps and stores the result in the current bitmap
func (*Bitmap) CheckedAdd ¶
CheckedAdd adds the integer x to the bitmap and return true if it was added (false if the integer was already present)
func (*Bitmap) CheckedRemove ¶
CheckedRemove removes the integer x from the bitmap and return true if the integer was effectively removed (and false if the integer was not present)
func (*Bitmap) Checksum ¶
Checksum computes a hash (currently FNV-1a) for a bitmap that is suitable for using bitmaps as elements in hash sets or as keys in hash maps, as well as generally quicker comparisons. The implementation is biased towards efficiency in little endian machines, so expect some extra CPU cycles and memory to be used if your machine is big endian. Likewise, don't use this to verify integrity unless you're certain you'll load the bitmap on a machine with the same endianess used to create it.
func (*Bitmap) Clear ¶
func (rb *Bitmap) Clear()
Clear resets the Bitmap to be logically empty, but may retain some memory allocations that may speed up future operations
func (*Bitmap) CloneCopyOnWriteContainers ¶
func (rb *Bitmap) CloneCopyOnWriteContainers()
CloneCopyOnWriteContainers clones all containers which have needCopyOnWrite set to true. This can be used to make sure it is safe to munmap a []byte that the roaring array may still have a reference to, after calling FromBuffer. More generally this function is useful if you call FromBuffer to construct a bitmap with a backing array buf and then later discard the buf array. Note that you should call CloneCopyOnWriteContainers on all bitmaps that were derived from the 'FromBuffer' bitmap since they map have dependencies on the buf array as well.
func (*Bitmap) ContainsInt ¶
ContainsInt returns true if the integer is contained in the bitmap (this is a convenience method, the parameter is casted to uint32 and Contains is called)
func (*Bitmap) Flip ¶
Flip negates the bits in the given range (i.e., [rangeStart,rangeEnd)), any integer present in this range and in the bitmap is removed, and any integer present in the range and not in the bitmap is added. The function uses 64-bit parameters even though a Bitmap stores 32-bit values because it is allowed and meaningful to use [0,uint64(0x100000000)) as a range while uint64(0x100000000) cannot be represented as a 32-bit value.
func (*Bitmap) FromBase64 ¶
FromBase64 deserializes a bitmap from Base64
func (*Bitmap) FromBuffer ¶
FromBuffer creates a bitmap from its serialized version stored in buffer
The format specification is available here: https://github.com/RexLetRock/roaringFormatSpec
The provided byte array (buf) is expected to be a constant. The function makes the best effort attempt not to copy data. You should take care not to modify buff as it will likely result in unexpected program behavior.
Resulting bitmaps are effectively immutable in the following sense: a copy-on-write marker is used so that when you modify the resulting bitmap, copies of selected data (containers) are made. You should *not* change the copy-on-write status of the resulting bitmaps (SetCopyOnWrite).
If buf becomes unavailable, then a bitmap created with FromBuffer would be effectively broken. Furthermore, any bitmap derived from this bitmap (e.g., via Or, And) might also be broken. Thus, before making buf unavailable, you should call CloneCopyOnWriteContainers on all such bitmaps.
func (*Bitmap) FrozenView ¶
FrozenView creates a static view of a serialized bitmap stored in buf. It uses CRoaring's frozen bitmap format.
The format specification is available here: https://github.com/RoaringBitmap/CRoaring/blob/2c867e9f9c9e2a3a7032791f94c4c7ae3013f6e0/src/roaring.c#L2756-L2783
The provided byte array (buf) is expected to be a constant. The function makes the best effort attempt not to copy data. Only little endian is supported. The function will err if it detects a big endian serialized file. You should take care not to modify buff as it will likely result in unexpected program behavior. If said buffer comes from a memory map, it's advisable to give it read only permissions, either at creation or by calling Mprotect from the golang.org/x/sys/unix package.
Resulting bitmaps are effectively immutable in the following sense: a copy-on-write marker is used so that when you modify the resulting bitmap, copies of selected data (containers) are made. You should *not* change the copy-on-write status of the resulting bitmaps (SetCopyOnWrite).
If buf becomes unavailable, then a bitmap created with FromBuffer would be effectively broken. Furthermore, any bitmap derived from this bitmap (e.g., via Or, And) might also be broken. Thus, before making buf unavailable, you should call CloneCopyOnWriteContainers on all such bitmaps.
func (*Bitmap) GetCardinality ¶
GetCardinality returns the number of integers contained in the bitmap
func (*Bitmap) GetCopyOnWrite ¶
GetCopyOnWrite gets this bitmap's copy-on-write property
func (*Bitmap) GetFrozenSizeInBytes ¶
func (*Bitmap) GetSerializedSizeInBytes ¶
GetSerializedSizeInBytes computes the serialized size in bytes of the Bitmap. It should correspond to the number of bytes written when invoking WriteTo. You can expect that this function is much cheaper computationally than WriteTo.
func (*Bitmap) GetSizeInBytes ¶
GetSizeInBytes estimates the memory usage of the Bitmap. Note that this might differ slightly from the amount of bytes required for persistent storage
func (*Bitmap) HasRunCompression ¶
HasRunCompression returns true if the bitmap benefits from run compression
func (*Bitmap) Intersects ¶
Intersects checks whether two bitmap intersects, bitmaps are not modified
func (*Bitmap) IntersectsWithInterval ¶
IntersectsWithInterval checks whether a bitmap 'rb' and an open interval '[x,y)' intersect.
func (*Bitmap) IsEmpty ¶
IsEmpty returns true if the Bitmap is empty (it is faster than doing (GetCardinality() == 0))
func (*Bitmap) Iterate ¶
Iterate iterates over the bitmap, calling the given callback with each value in the bitmap. If the callback returns false, the iteration is halted. The iteration results are undefined if the bitmap is modified (e.g., with Add or Remove). There is no guarantee as to what order the values will be iterated.
func (*Bitmap) Iterator ¶
func (rb *Bitmap) Iterator() IntPeekable
Iterator creates a new IntPeekable to iterate over the integers contained in the bitmap, in sorted order; the iterator becomes invalid if the bitmap is modified (e.g., with Add or Remove).
func (*Bitmap) ManyIterator ¶
func (rb *Bitmap) ManyIterator() ManyIntIterable
ManyIterator creates a new ManyIntIterable to iterate over the integers contained in the bitmap, in sorted order; the iterator becomes invalid if the bitmap is modified (e.g., with Add or Remove).
func (*Bitmap) MarshalBinary ¶
MarshalBinary implements the encoding.BinaryMarshaler interface for the bitmap (same as ToBytes)
func (*Bitmap) Maximum ¶
Maximum get the largest value stored in this roaring bitmap, assumes that it is not empty
func (*Bitmap) Minimum ¶
Minimum get the smallest value stored in this roaring bitmap, assumes that it is not empty
func (*Bitmap) Or ¶
Or computes the union between two bitmaps and stores the result in the current bitmap
func (*Bitmap) OrCardinality ¶
OrCardinality returns the cardinality of the union between two bitmaps, bitmaps are not modified
func (*Bitmap) Rank ¶
Rank returns the number of integers that are smaller or equal to x (Rank(infinity) would be GetCardinality()). If you pass the smallest value, you get the value 1. If you pass a value that is smaller than the smallest value, you get 0. Note that this function differs in convention from the Select function since it return 1 and not 0 on the smallest value.
func (*Bitmap) ReadFrom ¶
ReadFrom reads a serialized version of this bitmap from stream. The format is compatible with other RoaringBitmap implementations (Java, C) and is documented here: https://github.com/RexLetRock/roaringFormatSpec Since io.Reader is regarded as a stream and cannot be read twice. So add cookieHeader to accept the 4-byte data that has been read in roaring64.ReadFrom. It is not necessary to pass cookieHeader when call roaring.ReadFrom to read the roaring32 data directly.
func (*Bitmap) RemoveRange ¶
RemoveRange removes the integers in [rangeStart, rangeEnd) from the bitmap. The function uses 64-bit parameters even though a Bitmap stores 32-bit values because it is allowed and meaningful to use [0,uint64(0x100000000)) as a range while uint64(0x100000000) cannot be represented as a 32-bit value.
func (*Bitmap) ReverseIterator ¶
func (rb *Bitmap) ReverseIterator() IntIterable
ReverseIterator creates a new IntIterable to iterate over the integers contained in the bitmap, in sorted order; the iterator becomes invalid if the bitmap is modified (e.g., with Add or Remove).
func (*Bitmap) RunOptimize ¶
func (rb *Bitmap) RunOptimize()
RunOptimize attempts to further compress the runs of consecutive values found in the bitmap
func (*Bitmap) Select ¶
Select returns the xth integer in the bitmap. If you pass 0, you get the smallest element. Note that this function differs in convention from the Rank function which returns 1 on the smallest value.
func (*Bitmap) SetCopyOnWrite ¶
SetCopyOnWrite sets this bitmap to use copy-on-write so that copies are fast and memory conscious if the parameter is true, otherwise we leave the default where hard copies are made (copy-on-write requires extra care in a threaded context). Calling SetCopyOnWrite(true) on a bitmap created with FromBuffer is unsafe.
func (*Bitmap) Stats ¶
func (rb *Bitmap) Stats() Statistics
Stats returns details on container type usage in a Statistics struct.
func (*Bitmap) ToArray ¶
ToArray creates a new slice containing all of the integers stored in the Bitmap in sorted order
func (*Bitmap) ToBytes ¶
ToBytes returns an array of bytes corresponding to what is written when calling WriteTo
func (*Bitmap) UnmarshalBinary ¶
UnmarshalBinary implements the encoding.BinaryUnmarshaler interface for the bitmap
func (*Bitmap) WriteTo ¶
WriteTo writes a serialized version of this bitmap to stream. The format is compatible with other RoaringBitmap implementations (Java, C) and is documented here: https://github.com/RexLetRock/roaringFormatSpec
type IntIterable ¶
IntIterable allows you to iterate over the values in a Bitmap
type IntIterator ¶
type IntIterator = intIterator
IntIterator is meant to allow you to iterate through the values of a bitmap, see Initialize(a *Bitmap)
type IntPeekable ¶
type IntPeekable interface { IntIterable // PeekNext peeks the next value without advancing the iterator PeekNext() uint32 // AdvanceIfNeeded advances as long as the next value is smaller than minval AdvanceIfNeeded(minval uint32) }
IntPeekable allows you to look at the next value without advancing and advance as long as the next value is smaller than minval
type IntReverseIterator ¶
type IntReverseIterator = intReverseIterator
IntReverseIterator is meant to allow you to iterate through the values of a bitmap, see Initialize(a *Bitmap)
type ManyIntIterable ¶
type ManyIntIterable interface { // NextMany fills buf up with values, returns how many values were returned NextMany(buf []uint32) int // NextMany64 fills up buf with 64 bit values, uses hs as a mask (OR), returns how many values were returned NextMany64(hs uint64, buf []uint64) int }
ManyIntIterable allows you to iterate over the values in a Bitmap
type ManyIntIterator ¶
type ManyIntIterator = manyIntIterator
ManyIntIterator is meant to allow you to iterate through the values of a bitmap, see Initialize(a *Bitmap)
type Statistics ¶
type Statistics struct { Cardinality uint64 Containers uint64 ArrayContainers uint64 ArrayContainerBytes uint64 ArrayContainerValues uint64 BitmapContainers uint64 BitmapContainerBytes uint64 BitmapContainerValues uint64 RunContainers uint64 RunContainerBytes uint64 RunContainerValues uint64 }
Statistics provides details on the container types in use.
Source Files ¶
- arraycontainer.go
- bitmapcontainer.go
- clz.go
- ctz.go
- fastaggregation.go
- manyiterator.go
- parallel.go
- popcnt.go
- popcnt_generic.go
- popcnt_slices.go
- priorityqueue.go
- roaring.go
- roaringarray.go
- runcontainer.go
- serialization.go
- serialization_littleendian.go
- setutil.go
- setutil_generic.go
- shortiterator.go
- util.go