indexedhashes

package
v0.0.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2025 License: MIT Imports: 19 Imported by: 0

README

package indexedhashes

indexedhashes is a key-value database mapping unique sha256 hashes to counting integers.

  • indexedhashes holds a sequence of unique sha256 hashes for you.
  • They are indexed 0,1,2...
  • You can (of course) supply an index and lookup the corresponding hash previously stored (ie, an array)
  • But you can also supply a hash and lookup the index it is stored at (quickly) (ie, a dictionary)
Files
  • The hashes are stored in three files
  • FileName.hsh - the growing raw hashes in sequence
  • FileName.lkp - A fixed size lookup file indexed by the "partial hash" (the LSBs) of each full hash
  • FileName.cls - A growing linked list of chunks dealing with collisions: cases where two hashes have the same partial hashes
Parameters
  • partialHashBitCount - The number of LSBs of a 256 bit hash that constitute a partial hash
  • entryByteCount - The number of bytes that constitute an entry in the lookup file, or in a chunk
  • collisionsPerChunk - The number of collisions stored in a chunk

I've calculated/chosen some parameters for use with Bitcoin, suitable for the number of block hashes and transaction hashes currently in the blockchain as of 2023:

For Block Hashes: partialHashBitCount = 30, entryByteCount = 3, collisionsPerChunk = 3

For Transaction Hashes: partialHashBitCount = 30, entryByteCount = 4, collisionsPerChunk = 3

How to use indexedhashes package

1. Instantiate a ConcreteHashStoreCreator (a factory)
import (
		"github.com/KitchenMishap/pudding-shed/indexedhashes"
        "crypto/sha256"
		"log"
)

phbc := 30      // Partial hash bit count
ebc := 4        // Entry byte count
cpc := 3        // Collisions per chunk
creator, err := NewConcreteHashStoreCreator("FileName", "FolderName", phbc, ebc, cpc)
if err != nil {
    log.Println(err)
}
2. Create the hash store files
err = creator.CreateHashStore()
if err != nil {
    log.Println(err)
}
3. Open the hash store for read/write
store, err := creator.OpenHashStore()
if err != nil {
    log.Println(err)
}
4. Create some hashes of strings using package crypto/sha256
h0 := Sha256{}
h := sha256.New()
h.Write([]byte("Hello"))
o := h.Sum(nil)
for i := 0; i < len(o); i++ {
    h0[i] = o[i]
}
h1 := Sha256{}
h = sha256.New()
h.Write([]byte("World"))
o = h.Sum(nil)
for i := 0; i < len(o); i++ {
    h1[i] = o[i]
}
5. Store the hashes
index, err := store.AppendHash(&h0)
if err != nil {
    log.Println(err)
}
index, err = store.AppendHash(&h1)
if err != nil {
    log.Println(err)
}
6. Read back the hash at index 0
hash := Sha256{}
err = store.GetHashAtIndex(0, &hash)
if err != nil {
    log.Println(err)
}
7. Find the index of hash h1
index, err = store.IndexOfHash(&h1)
if err != nil {
    log.Println(err)
}
println(index)

How it works - file formats

1. Hashes file, FileName.hsh

This is simply the binary 256 bit full hashes stored in sequence

2. Lookup file, FileName.lkp
  • This is a large fixed size file, initially full of zeroes.
  • Each entry is entryByteCount bytes long
  • It is indexed by partial hash, which is the LSBs of a full 256 bit hash
  • There are partialHashBitCount bits in a partial hash
  • Thus the filesize is 2 ^ partialHashBitCount * entryByteCount
  • An entry of zero means there are no stored hashes having the given partial hash (LSBs)
  • If the MSB of an entry us unset, then the entry is the only matching hash's index PLUS ONE
  • If the MSB is set, then there are multiple stored hashes matching the given partial hash
  • In this case, the lower bits are a chunk index PLUS ONE and you must then consult the collisions file, starting at that chunk
3. Collisions file, FileName.cls
  • This is a linked list of chunks, each containing collisionsPerChunk entries, followed by a link
  • Each chunk pertains to a collision; that is, one partial hash that matches multiple stored full hashes in the Hashes file
  • If a chunk overflows, it will end with a link index to another chunk that continues documenting the collision
  • Each entry in a chunk, as for the Lookup file, and including the final link, is entryByteCount bytes long
  • Each entry in a chunk is either a hash index PLUS ONE, or a zero
  • A non-zero entry is the hash index PLUS ONE for one of the hashes that matches the partial hash
  • We must continue through the chunk, and any linked chunks, as there may be more than one full hash in the collision
  • If we encounter a zero entry, then we have encountered all the matching hashes; we must check which full hash matches
  • Once we've traversed collisionsPerChunk entries in a chunk, we must read and follow the link that follows them
  • A link is a chunk index PLUS ONE, and we can follow it by calculating a file Seek
  • A link of zero represents the end of the linked list, so all potential hash indices have been encountered

Documentation

Index

Constants

View Source
const ZEROBUF = 32 // Arbitrary number. Should be enough (we do check)

Variables

This section is empty.

Functions

func BiggestAddressPlusOne added in v0.0.12

func BiggestAddressPlusOne(hashDivider uint64) int64

func CreateHashEmptyFiles added in v0.0.12

func CreateHashEmptyFiles() error

func CreateHashIndexFiles added in v0.0.12

func CreateHashIndexFiles() error

func HashHexToSha256

func HashHexToSha256(hexAscii string, sha256 *Sha256) error

func HashSha256ToHexString

func HashSha256ToHexString(hash *Sha256) string

func NewHashEntriesList added in v0.0.12

func NewHashEntriesList(address uint64) *hashEntriesList

func NewUniformHashStoreCreatorAndPreloader added in v0.0.12

func NewUniformHashStoreCreatorAndPreloader(
	folder string, name string,
	hashCountEstimate int64, digitsPerFolder int,
	gigabytesMem int64) (HashStoreCreator, *MultipassPreloader)

func NewUniformHashStoreCreatorAndPreloaderFromFile added in v0.0.12

func NewUniformHashStoreCreatorAndPreloaderFromFile(
	folder string, name string, gigabytesMem int64) (*UniformHashStoreCreator, *MultipassPreloader, error)

Types

type BasicHashStore

type BasicHashStore struct {
	// contains filtered or unexported fields
}

func NewBasicHashStore

func NewBasicHashStore(hashFile *wordfile.HashFile) *BasicHashStore

func (*BasicHashStore) AppendHash

func (bhs *BasicHashStore) AppendHash(hash *Sha256) (int64, error)

func (*BasicHashStore) Close

func (bhs *BasicHashStore) Close() error

func (*BasicHashStore) CountHashes

func (bhs *BasicHashStore) CountHashes() (int64, error)

func (*BasicHashStore) GetHashAtIndex

func (bhs *BasicHashStore) GetHashAtIndex(index int64, hash *Sha256) error

func (*BasicHashStore) IndexOfHash added in v0.0.7

func (bhs *BasicHashStore) IndexOfHash(hash *Sha256) (int64, error)

IndexOfHash This is a very slow naive implementation, and should only be used for testing

func (*BasicHashStore) Sync

func (bhs *BasicHashStore) Sync() error

type ConcreteHashStoreCreator

type ConcreteHashStoreCreator struct {
	// contains filtered or unexported fields
}

func NewConcreteHashStoreCreator

func NewConcreteHashStoreCreator(name string, folder string,
	partialHashBitCount int64, entryByteCount int64, collisionsPerChunk int64,
	useMemFileForLookup bool, useAppendOptimizedForHashes bool) (*ConcreteHashStoreCreator, error)

func (*ConcreteHashStoreCreator) CreateHashStore

func (hsc *ConcreteHashStoreCreator) CreateHashStore() error

func (*ConcreteHashStoreCreator) HashStoreExists

func (hsc *ConcreteHashStoreCreator) HashStoreExists() bool

func (*ConcreteHashStoreCreator) OpenHashStore

func (hsc *ConcreteHashStoreCreator) OpenHashStore() (HashReadWriter, error)

func (*ConcreteHashStoreCreator) OpenHashStoreReadOnly

func (hsc *ConcreteHashStoreCreator) OpenHashStoreReadOnly() (HashReader, error)

type ConcreteHashStoreCreatorMemory added in v0.0.12

type ConcreteHashStoreCreatorMemory struct {
	// contains filtered or unexported fields
}

func NewConcreteHashStoreCreatorMemory added in v0.0.12

func NewConcreteHashStoreCreatorMemory(name string, folder string) (*ConcreteHashStoreCreatorMemory, error)

func (*ConcreteHashStoreCreatorMemory) CreateHashStore added in v0.0.12

func (hsc *ConcreteHashStoreCreatorMemory) CreateHashStore() error

func (*ConcreteHashStoreCreatorMemory) HashStoreExists added in v0.0.12

func (hsc *ConcreteHashStoreCreatorMemory) HashStoreExists() bool

func (*ConcreteHashStoreCreatorMemory) OpenHashStore added in v0.0.12

func (hsc *ConcreteHashStoreCreatorMemory) OpenHashStore() (HashReadWriter, error)

func (*ConcreteHashStoreCreatorMemory) OpenHashStoreReadOnly added in v0.0.12

func (hsc *ConcreteHashStoreCreatorMemory) OpenHashStoreReadOnly() (HashReader, error)

type HashReadWriter

type HashReadWriter interface {
	HashReader
	AppendHash(hash *Sha256) (int64, error)
	Sync() error
}

type HashReader

type HashReader interface {
	IndexOfHash(hash *Sha256) (int64, error)
	GetHashAtIndex(index int64, hash *Sha256) error
	CountHashes() (int64, error)
	Close() error
}

type HashStore

type HashStore struct {
	// contains filtered or unexported fields
}

func NewHashStore

func NewHashStore(partialHashBitCount int64, entryByteCount int64, collisionsPerChunk int64,
	hashesFile *BasicHashStore,
	lookupsFile memfile.SparseLookupFile, collisionsFile *os.File) *HashStore

func (*HashStore) AppendHash

func (hs *HashStore) AppendHash(hash *Sha256) (int64, error)

func (*HashStore) Close

func (hs *HashStore) Close() error

func (*HashStore) CountHashes

func (hs *HashStore) CountHashes() (int64, error)

func (*HashStore) GetHashAtIndex

func (hs *HashStore) GetHashAtIndex(index int64, hash *Sha256) error

func (*HashStore) IndexOfHash

func (hs *HashStore) IndexOfHash(hash *Sha256) (int64, error)

IndexOfHash -1 indicates "Not Present" but error will be nil if that's all that is wrong

func (*HashStore) Sync

func (hs *HashStore) Sync() error

type HashStoreCreator

type HashStoreCreator interface {
	HashStoreExists() bool
	CreateHashStore() error
	OpenHashStore() (HashReadWriter, error)
	OpenHashStoreReadOnly() (HashReader, error)
}

type MemoryIndexedHashes added in v0.0.12

type MemoryIndexedHashes struct {
	// contains filtered or unexported fields
}

func NewMemoryIndexedHashes added in v0.0.12

func NewMemoryIndexedHashes(hashFile *wordfile.HashFile) *MemoryIndexedHashes

func (*MemoryIndexedHashes) AppendHash added in v0.0.12

func (mih *MemoryIndexedHashes) AppendHash(hash *Sha256) (int64, error)

func (*MemoryIndexedHashes) Close added in v0.0.12

func (mih *MemoryIndexedHashes) Close() error

func (*MemoryIndexedHashes) CountHashes added in v0.0.12

func (mih *MemoryIndexedHashes) CountHashes() (int64, error)

func (*MemoryIndexedHashes) GetHashAtIndex added in v0.0.12

func (mih *MemoryIndexedHashes) GetHashAtIndex(index int64, hash *Sha256) error

func (*MemoryIndexedHashes) IndexOfHash added in v0.0.12

func (mih *MemoryIndexedHashes) IndexOfHash(hash *Sha256) (int64, error)

func (*MemoryIndexedHashes) Sync added in v0.0.12

func (mih *MemoryIndexedHashes) Sync() error

type MultipassPreloader added in v0.0.12

type MultipassPreloader struct {
	// contains filtered or unexported fields
}

MultipassPreloader is a system that preloads a UniformHashStore fileset. To minimize memory usage, it reads the hashes file multiple times, each time concentrating on a different set of address bins (the set is known as a dumpster)

func NewMultipassPreloader added in v0.0.12

func NewMultipassPreloader(creator *UniformHashStoreCreator, bytesPerPass int64) *MultipassPreloader

func (*MultipassPreloader) CreateInitialFiles added in v0.0.12

func (mp *MultipassPreloader) CreateInitialFiles() error

func (*MultipassPreloader) IndexTheHashes added in v0.0.12

func (mp *MultipassPreloader) IndexTheHashes() error

type ReadWriteSeekCloser

type ReadWriteSeekCloser interface {
	io.ReadWriteSeeker
	io.Closer
}

type Sha256

type Sha256 [32]byte

func HashOfInt added in v0.0.12

func HashOfInt(in uint64) Sha256

type SinglePassDetails added in v0.0.12

type SinglePassDetails struct {
	// contains filtered or unexported fields
}

SinglePassDetails holds details of one of the multiple passes

func NewSinglePassDetails added in v0.0.12

func NewSinglePassDetails(firstAddress int64, addressCount int64) *SinglePassDetails

func (*SinglePassDetails) ReadIn added in v0.0.12

func (spd *SinglePassDetails) ReadIn(mp *MultipassPreloader) error

type UniformHashPreLoader added in v0.0.12

type UniformHashPreLoader struct {
	// contains filtered or unexported fields
}

func NewUniformHashPreLoader added in v0.0.12

func NewUniformHashPreLoader(us *UniformHashStore, blocker *memblocker.MemBlocker) *UniformHashPreLoader

type UniformHashStore added in v0.0.12

type UniformHashStore struct {
	// contains filtered or unexported fields
}

func (*UniformHashStore) AppendHash added in v0.0.12

func (us *UniformHashStore) AppendHash(hash *Sha256) (int64, error)

func (*UniformHashStore) Close added in v0.0.12

func (us *UniformHashStore) Close() error

func (*UniformHashStore) CountHashes added in v0.0.12

func (us *UniformHashStore) CountHashes() (int64, error)

func (*UniformHashStore) GetHashAtIndex added in v0.0.12

func (us *UniformHashStore) GetHashAtIndex(index int64, hash *Sha256) error

func (*UniformHashStore) IndexOfHash added in v0.0.12

func (us *UniformHashStore) IndexOfHash(hash *Sha256) (int64, error)

func (*UniformHashStore) Sync added in v0.0.12

func (us *UniformHashStore) Sync() error

func (*UniformHashStore) Test added in v0.0.12

func (us *UniformHashStore) Test() (bool, error)

type UniformHashStoreCreator added in v0.0.12

type UniformHashStoreCreator struct {
	// contains filtered or unexported fields
}

func NewUniformHashStoreCreatorPrivate added in v0.0.12

func NewUniformHashStoreCreatorPrivate(hashCountEstimate int64,
	folder string, name string, digitsPerFolder int) *UniformHashStoreCreator

func (*UniformHashStoreCreator) CreateHashStore added in v0.0.12

func (uc *UniformHashStoreCreator) CreateHashStore() error

func (*UniformHashStoreCreator) HashStoreExists added in v0.0.12

func (uc *UniformHashStoreCreator) HashStoreExists() bool

func (*UniformHashStoreCreator) OpenHashStore added in v0.0.12

func (uc *UniformHashStoreCreator) OpenHashStore() (HashReadWriter, error)

func (*UniformHashStoreCreator) OpenHashStoreReadOnly added in v0.0.12

func (uc *UniformHashStoreCreator) OpenHashStoreReadOnly() (HashReader, error)

type UniformHashStoreParams added in v0.0.12

type UniformHashStoreParams struct {
	HashDivider     uint64 `json:"hashDivider"`
	DigitsPerFolder int    `json:"digitsPerFolder"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL