watrix

package module
v0.0.0-...-2baf17d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2024 License: MIT Imports: 4 Imported by: 0

README

watrix

Caution

Currently, the API is unstable because I'm experimenting various queries.

Build Status

watrix is a Go package for myriad array operations using wavelet matrix (wavelet tree).

watrix stores a non-negative intger array V[0...n), 0 <= V[i] < s and support almost all operations in O(log s) time (not depends on num) using at most (n * log_2 s) bits plus small overheads for storing auxiually indices.

Usage

GoDoc

See godoc for reference. It was originally folked from github.com/hillbig/waveletTree, but compatibility is not maintained.

Benchmark

New

  • 2.8 GHz Xeon [in AWS]
  • 32 cores (only a single thread is used)

Note: IgnoreLSBs version performs quite well.

go test -v -bench . -benchmem

{N = 10000000 is used in the tests below}   

BenchmarkWM_Build-2    						1	11816507411 ns/op	10731534600 B/op	    7925 allocs/op

BenchmarkWM_Lookup-2                	   50000	     24509 ns/op	       0 B/op	       0 allocs/op
BenchmarkWM_Rank-2                  	   50000	     27384 ns/op	       0 B/op	       0 allocs/op
BenchmarkWM_RangedRankIgnoreLSBs-2  	  100000	     22119 ns/op	       0 B/op	       0 allocs/op
BenchmarkWM_RankLessThan-2          	   50000	     26901 ns/op	       0 B/op	       0 allocs/op

BenchmarkWM_Select-2                	   30000	     43022 ns/op	       0 B/op	       0 allocs/op
BenchmarkWM_RangedSelectIgnoreLSBs-2	   30000	     47325 ns/op	       0 B/op	       0 allocs/op

BenchmarkWM_Quantile-2              	   50000	     23663 ns/op	       0 B/op	       0 allocs/op

BenchmarkRaw_Lookup-2               	20000000	       124 ns/op	       0 B/op	       0 allocs/op

BenchmarkRaw_Rank-2                 	     200	   5494451 ns/op	       0 B/op	       0 allocs/op
BenchmarkRaw_Select-2               	     100	  17108476 ns/op	       0 B/op	       0 allocs/op
BenchmarkRaw_Quantile-2             	      50	  20772400 ns/op	      32 B/op	       1 allocs/op

Old

  • 1.7 GHz Intel Core i7
  • OS X 10.9.2
  • 8GB 1600 MHz DDR3
  • go version go1.3 darwin/amd64

The results shows that RSDic operations require always (almost) constant time with regard to the length and one's ratio.

go test -bench=.

// Build a watrix for an integer array of length 10^6 with s = 2^64
BenchmarkWTBuild1M	       1	1455321650 ns/op
// 1.455 micro sec per an interger

// A watrix for an integer array of length 10M (10^7) with s = 2^64
BenchmarkWTBuild10M	       1	1467061166 ns/op
BenchmarkWTLookup10M	  100000	     29319 ns/op
BenchmarkWTRank10M	  100000	     28278 ns/op
BenchmarkWTSelect10M	   50000	     50250 ns/op
BenchmarkWTQuantile10M	  100000	     28852 ns/op

// An array []uint64 of length 10M (10^7) for comparison
BenchmarkRawLookup10M	20000000	       109 ns/op
BenchmarkRa
wRank10M	     500	   4683822 ns/op
BenchmarkRawSelect10M	     500	   6085992 ns/op
BenchmarkRawQuantile10M	     100	  44362885 ns/op

Documentation

Overview

Package watrix provides a wavelet matrix (wavelet tree) that supports Rank/Select and other useful queries in O(1).

WaveletMatrix is a data structure that efficiently performs various queries on an integer value sequence.

Most of the queries are done in O(1) regardless of the length of the sequence. Query time depends on the number of bits of the stored values. Querying on 16-bit value sequence is 4-times faster than that on 64-bit value sequence.

It's based on github.com/hillbig/waveletTree, and added various queries including ranged and ignoreBits version of Rank()/Select().

Index

Constants

View Source
const (
	// OpEqual is used in RangedRankOp()
	OpEqual = iota
	// OpLessThan is used in RangedRankOp()
	OpLessThan
	// OpMoreThan is used in RangedRankOp()
	OpMoreThan
	// OpMax is upper boundary for OpXXXX constants
	OpMax
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Range

type Range struct {
	Beg uint64
	End uint64
}

Range represents a range [Beg, End). Only valid when Beg <= End.

type WaveletMatrix

type WaveletMatrix struct {
	// contains filtered or unexported fields
}

WaveletMatrix is a data structure that efficiently performs various queries on an integer value sequence.

Most of the queries are done in O(1) regardless of the length of the sequence. Query time depends on the number of bits of the data stored. Querying on 16-bit value sequence is 4-times faster than that on 64-bit value sequence.

func (*WaveletMatrix) Dim

func (wm *WaveletMatrix) Dim() uint64

Dim returns (max. of T[0...Num) + 1)

func (*WaveletMatrix) Intersect

func (wm *WaveletMatrix) Intersect(ranges []Range, k int) []uint64

Intersect returns values that occur at least k ranges.

func (*WaveletMatrix) Lookup

func (wm *WaveletMatrix) Lookup(pos uint64) uint64

Lookup returns T[pos]

func (*WaveletMatrix) LookupAndRank

func (wm *WaveletMatrix) LookupAndRank(pos uint64) (uint64, uint64)

LookupAndRank returns T[pos] and Rank(pos, T[pos]) in one call. Faster than calling Lookup and Rank separately.

func (*WaveletMatrix) MarshalBinary

func (wm *WaveletMatrix) MarshalBinary() (out []byte, err error)

MarshalBinary encodes WaveletMatrix into a binary form and returns the result.

func (*WaveletMatrix) MarshalBinaryFile

func (wm *WaveletMatrix) MarshalBinaryFile(outpath string) error

MarshalBinary encodes WaveletMatrix into a binary form and writes it to a file.

func (*WaveletMatrix) Num

func (wm *WaveletMatrix) Num() uint64

Num return the number of values in T

func (*WaveletMatrix) Quantile

func (wm *WaveletMatrix) Quantile(posRange Range, k uint64) uint64

Quantile returns (k+1)th smallest value in T[posRange.Beg, posRange.End).

func (*WaveletMatrix) RangedRankIgnoreLSBs

func (wm *WaveletMatrix) RangedRankIgnoreLSBs(posRange Range, val, ignoreBits uint64) (rank uint64)

RangedRankIgnoreLSBs searches T[posRange.Beg, posRange.End) and returns the number of c that matches the val.

If ignoreBits > 0, ignoreBits-bit portion from LSB are not considered for the match. This behavior is useful for IP address prefix search such as 192.168.10.0/24 (ignoreBits in this case, is 8).

func (*WaveletMatrix) RangedRankOp

func (wm *WaveletMatrix) RangedRankOp(posRange Range, val uint64, op int) (rankResult uint64)

RangedRankOp returns the number of c that satisfies 'c op val' in T[posRange.Beg, posRange.End). The op should be one of {OpEqual, OpLessThan, OpMoreThan}.

func (*WaveletMatrix) RangedRankRange

func (wm *WaveletMatrix) RangedRankRange(posRange Range, valueRange Range) (rank uint64)

RangedRankRange searches T[posRange.Beg, posRange.End) and returns the number of c that falls within valueRange i.e. [valueRange.Beg, valueRange.End).

func (*WaveletMatrix) RangedSelect

func (wm *WaveletMatrix) RangedSelect(posRange Range, rank uint64, val uint64) uint64

RangedSelect takes T[posRange) and returns the position of rank+1'th val. If no match has been found, it returns posRange.End.

func (*WaveletMatrix) RangedSelectIgnoreLSBs

func (wm *WaveletMatrix) RangedSelectIgnoreLSBs(posRange Range, rank, val, ignoreBits uint64) (position uint64)

RangedSelectIgnoreLSBs searches T[posRange.Beg, posRange.End) and returns the position of (rank+1)'th c that matches the val. If no match has been found, it returns posRange.End.

If ignoreBits > 0, ignoreBits-bit portion from LSB are not considered for the match. This behavior is useful for IP address prefix search such as 192.168.10.0/24 (ignoreBits in this case, is 8).

func (*WaveletMatrix) Rank

func (wm *WaveletMatrix) Rank(pos uint64, val uint64) (rank uint64)

Rank returns the number of c (== val) in T[0...pos)

func (*WaveletMatrix) RankLessThan

func (wm *WaveletMatrix) RankLessThan(pos uint64, val uint64) (rankLessThan uint64)

RankLessThan returns the number of c (< val) in T[0...pos)

func (*WaveletMatrix) RankMoreThan

func (wm *WaveletMatrix) RankMoreThan(pos uint64, val uint64) (rankLessThan uint64)

RankMoreThan returns the number of c (> val) in T[0...pos)

func (*WaveletMatrix) Select

func (wm *WaveletMatrix) Select(rank uint64, val uint64) (position uint64)

Select returns the position of (rank+1)-th val in T. If no match has been found, it returns Num().

func (*WaveletMatrix) UnmarshalBinary

func (wm *WaveletMatrix) UnmarshalBinary(in []byte) (err error)

UnmarshalBinary decodes WaveletMatrix from a binary form generated MarshalBinary.

func (*WaveletMatrix) UnmarshalBinaryFile

func (wm *WaveletMatrix) UnmarshalBinaryFile(inpath string) error

UnmarshalBinaryFile decodes WaveletMatrix from a binary file generated MarshalBinaryFile.

type WaveletMatrixBuilder

type WaveletMatrixBuilder struct {
	// contains filtered or unexported fields
}

WaveletMatrixBuilder builds WaveletMatrix from integer array. A user calls PushBack()s followed by Build().

func NewBuilder

func NewBuilder() *WaveletMatrixBuilder

NewBuilder returns Builder

func (*WaveletMatrixBuilder) Build

func (wmb *WaveletMatrixBuilder) Build() *WaveletMatrix

Build constructs WaveletMatrix data structure

func (*WaveletMatrixBuilder) PushBack

func (wmb *WaveletMatrixBuilder) PushBack(val uint64)

PushBack append a value to the builder.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL