apophenia

package module
v0.0.0-...-68b7a14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 27, 2019 License: BSD-3-Clause Imports: 8 Imported by: 5

README

Apophenia -- seeking patterns in randomness

Apophenia provides an approximate emulation of a seekable pseudo-random number generator. You provide a seed, and get a generator which can generate a large number of pseudo-random bits which will occur in a predictable pattern, but you can seek anywhere in that pattern in constant time.

Apophenia's interface is intended to be similar to that of stdlib's math/rand; the Sequence interface type is a strict superset of rand.Source64. However, you can use the sequence's Seek method to control what the corresponding source will generate next. (In principle, you could implement other distinct sequence types, but the AES-128 implementation is the only one right now.)

If you want something similar to this, but intended for cryptographic purposes, look at Fortuna, which is similar in some ways, but intended to be usable in cryptographic contexts: https://www.schneier.com/academic/fortuna/

The original use case for apophenia was an internal tool which generates data sets, where we wanted to be able to generate the same resulting set of data, while generating the data in different orders; the simplest way to do that is to ensure that, for a given position in a matrix, you are generating the same random bits when you generate that position, regardless of which other positions have already been generated, or in what order.

Implementation Notes

A PRNG should generate a sequence of unpredictable bits, with high entropy, and minimal ability to predict future bits from previous bits.

If you view the input to AES-128 as a 128-bit number, the outputs for sequential numbers are unpredictable -- knowing the current number or its hash result gives you no information about the bits in the hash result of the next number. So AES-128 plus a sequence of 128-bit values ranging from 0 to (1<<128)-1 is just like a PRNG. But you can seek to any location within its stream and generate the bits at that location in constant time, because you never need to know about a previous "state".

So AES-128, seeded, is equivalent to a PRNG which generates random bits with a period of (1<<135).

This design may have serious fundamental flaws, but it worked out in light testing and I'm an optimist. Most obviously, if you generate a mere 2^64 consecutive blocks that are genuinely random, the expected number of collisions is about 1; the expected number of collisions with apophenia is exactly 0.

Sequences and Offsets

Apophenia's underlying implementation admits 128-bit keys, and 128-bit offsets within each sequence. In most cases:

  • That's more space than we need.
  • Working with a non-native type for item numbers is annoying, but 64 bits is enough range.
  • It would be nice to avoid using the same pseudo-random values for different things.
  • Even when those things have the same basic identifying ID or value.

For instance, say you wish to generate a few billion objects, each of which has multiple associated values, and you want to generate the values randomly. Some values might follow a Zipf distribution, others might just be "U % N" for some N. If you use the item number as a key, and seek to the same position for each of these, and get the same bits for each of these, you may get unintended similarities or correlations between them.

With this in mind, apophenia divides its 128-bit offset space into a number of spaces. 8 of the bits are used for a sequence-type value, one of:

  • SequenceDefault
  • SequencePermutationK/SequencePermutationF: permutations
  • SequenceWeighted: weighted bits
  • SequenceLinear: linear values within a range
  • SequenceZipfU: uniforms to use for Zipf values
  • SequenceRandSource: default offsets for the rand.Source
  • SequenceUser1/SequenceUser2: reserved for non-apophenia usage

Other values are not yet defined, but are reserved.

Within most of these spaces, the rest of the high word of the offset is used for a 'seed' (used to select different sequences) and an 'iteration' (used for successive values consumed by an algorithm). For instance, the Zipf implementation, much like the Go stdlib implementation, may consume multiple values; it does so by requesting blocks with consecutive iteration values.

The low-order word is treated as a 64-bit item ID.

High-order word:
0-23                     24-31   32-63
[iteration              ][seq   ][seed                         ]

Low-order word:
0-63
[id                                                            ]

The convenience function OffsetFor(sequence, seed, iteration, id) supports this usage.

As a side effect, if generating additional values for a given seed and id, you can increment the high-order word of the Uint128, and if generating values for a new id, you can increment the low-order word. If your algorithm consumes more than 2^24 values for a single operation, incrementing will bump the sequence value -- meaning you start generating values that were used for iterations of a different kind of sequence entirely, not values used for a different seed of this sequence.

This division of space is arbitrary, and you're welcome to ignore it and use the space however you like.

Permutations

Apophenia provides a permutation generator, which generates the values from 0 through N in an arbitrary order. For even small N, the number of theoretically possible permutations vastly exceeds the space of possible seed values, so only a vanishingly small number of possible permutations will ever be generated (around 2^64 for any given apophenia seed), but in practice that's still quite a lot.

Zipf Distribution

Apophenia provides a seekable Zipf generator. This is basically equivalent to using stdlib's Zipf type with an apophenia.Sequence as the rand.Source backing its rand.Rand, only you don't have to mess around with it as much to control that. Also, this one uses q and v rather than s and v as the parameter names to align with the paper it was derived from.

Iteration usage

For the built-in consumers:

  • Weighted consumes log2(scale) iterated values.
  • Zipf consumes an average of no more than about 1.1 values.
  • Permutation consumes one iterated value per 128 rounds of permutation, where rounds is equal to 6*ceil(log2(max)). (For instance, a second value is consumed around a maximum of 2^22, and a third around 2^43.)
  • Nothing else uses more than one iterated value.

Documentation

Overview

Package apophenia provides seekable pseudo-random numbers, allowing reproducibility of pseudo-random results regardless of the order they're generated in.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Permutation

type Permutation struct {
	// contains filtered or unexported fields
}

PermutationGenerator provides a way to pass integer IDs through a permutation map that is pseudorandom but repeatable. This could be done with `rand.Perm`, but that would require storing a slice of `[Items]int64`, which we want to avoid for large values of Items.

Not actually cryptographically secure.

This implementation is based on: http://arxiv.org/abs/1208.1176v2

This simulates the results of a shuffle in a way allowing a lookup of the results of the shuffle for any given position, in time proportional to a number of "rounds", each of which is 50% likely to swap a slot with another slot. The number of rounds needed to achieve a reasonable probability of safety is `log(N)*6` or so.

Each permutation is fully defined by a "key", consisting of:

  1. A key "KF" naming a value in [0,max) for each round.
  2. A series of round functions mapping values in [0,max) to bits, one for each round.

I refer to these as K[r] and F[r]. Thus, K[0] is the index used to compute swap operations four round 0, and F[0] is the series of bits used to determine whether a swap is performed, with F[0][0] being the swap decision for slot 0 in round 0. (Except it probably isn't, because the swap decision is actually made based on the highest index in a pair, to ensure that a swap between A and B always uses the same decision bit.)

K values are generated using the SequencePermutationK range of offsets, with the provided seed, and iteration id 0.

F values are generated using the SequencePermutationF range of offsets, with the provided seed, and iteration ID corresponding to the current round.

For F values, we set byte 8 of the plain text to 0x00, and use encoding/binary to dump the slot number into the first 8 bytes. This yields 128 values, which we treat as the values for the first 128 rounds, and then recycle for rounds 129+ if those exist. This is not very secure, but we're already at 1/2^128 chances by that time and don't care. We could probably trim rounds to 64 or so and not lose much data.

func NewPermutation

func NewPermutation(max int64, seed uint32, src Sequence) (*Permutation, error)

NewPermutation creates a Permutation which generates values in [0,max), from a given Sequence and seed value.

The seed parameter selects different shuffles, and is useful if you need to generate multiple distinct shuffles from the same underlying sequence. Treat it as a secondary seed.

func (*Permutation) Next

func (p *Permutation) Next() (ret int64)

Next generates the next value from the permutation.

func (*Permutation) Nth

func (p *Permutation) Nth(n int64) (ret int64)

Nth generates the Nth value from the permutation. For instance, given a new permutation, calling Next once produces the same value you'd get from calling Nth(0). Seeking using Nth changes the offset that Next counts from; after calling Nth(x), you would get the same result from Next() that you would from Nth(x+1).

Negative offsets count from the end; if p.max is 10, then Nth(9) and Nth(-1) produce the same value.

type Sequence

type Sequence interface {
	rand.Source64
	Seek(Uint128) Uint128
	BitsAt(Uint128) Uint128
}

Sequence represents a specific deterministic but pseudo-random-ish series of bits. A Sequence can be used as a `rand.Source` or `rand.Source64` for `math/rand`.

func NewSequence

func NewSequence(seed int64) Sequence

NewSequence generates a sequence initialized with the given seed.

type SequenceClass

type SequenceClass uint8

SequenceClass denotes one of the sequence types, which are used to allow sequences to avoid hitting each other's pseudo-random results.

const (
	// SequenceDefault is the zero value, used if you didn't think to pick one.
	SequenceDefault SequenceClass = iota
	// SequencePermutationK is the K values for the permutation algorithm.
	SequencePermutationK
	// SequencePermutationF is the F values for the permutation algorithm.
	SequencePermutationF
	// SequenceWeighted is used to generate weighted values for a given
	// position.
	SequenceWeighted
	// SequenceLinear is the random numbers for U%N type usage.
	SequenceLinear
	// SequenceZipfU is the random numbers for the Zipf computations.
	SequenceZipfU
	// SequenceRandSource is used by default when a Sequence is being
	// used as a rand.Source.
	SequenceRandSource
	// SequenceUser1 is eserved for non-apophenia package usage.
	SequenceUser1
	// SequenceUser2 is reserved for non-apophenia package usage.
	SequenceUser2
)

type Uint128

type Uint128 struct {
	Lo, Hi uint64 // low-order and high-order uint64 words. Value is “(Hi << 64) | Lo`.
}

Uint128 is a pair of uint64, treated as a single object to simplify calling conventions. It's a struct rather than an array for two reasons:

1. The go compiler seems better at this.

2. [0] and [1] are ambiguous, .Lo and .Hi aren't.

func Mask

func Mask(n uint64) (u Uint128)

Mask produces a bitmask with n bits set.

func OffsetFor

func OffsetFor(class SequenceClass, seed uint32, iter uint32, id uint64) Uint128

OffsetFor determines the Uint128 offset for a given class/seed/iteration/id.

func (*Uint128) Add

func (u *Uint128) Add(value Uint128)

Add adds value to its receiver in place.

func (*Uint128) And

func (u *Uint128) And(value Uint128)

And does a bitwise and with value, in place.

func (*Uint128) Bit

func (u *Uint128) Bit(n uint64) uint64

Bit returns 1 if the nth bit is set, 0 otherwise.

func (*Uint128) Inc

func (u *Uint128) Inc()

Inc increments its receiver in place.

func (*Uint128) Mask

func (u *Uint128) Mask(n uint64)

Mask produces a mask of the lower n bits of u.

func (*Uint128) Not

func (u *Uint128) Not()

Not does a bitwise complement in place.

func (*Uint128) Or

func (u *Uint128) Or(value Uint128)

Or does a bitwise or with value, in place.

func (*Uint128) RotateLeft

func (u *Uint128) RotateLeft(n uint64)

RotateLeft rotates u left by n bits.

func (*Uint128) RotateRight

func (u *Uint128) RotateRight(n uint64)

RotateRight rotates u right by n bits.

func (*Uint128) ShiftLeft

func (u *Uint128) ShiftLeft(n uint64)

ShiftLeft rotates u left by n bits.

func (*Uint128) ShiftRight

func (u *Uint128) ShiftRight(n uint64)

ShiftRight shifts u right by n bits.

func (*Uint128) ShiftRightCarry

func (u *Uint128) ShiftRightCarry(n uint64) (out Uint128, carry uint64)

ShiftRightCarry returns both the shifted value and the bits that were shifted out. Useful for when you want both x%N and x/N for N a power of 2. Only sane if bits <= 64.

func (Uint128) String

func (u Uint128) String() string

String provides a string representation.

func (*Uint128) Sub

func (u *Uint128) Sub(value Uint128)

Sub subtracts value from its receiver in place.

func (*Uint128) Xor

func (u *Uint128) Xor(value Uint128)

Xor does a bitwise xor with value, in place.

type Weighted

type Weighted struct {
	// contains filtered or unexported fields
}

Weighted provides a generator which produces weighted bits -- bits with a specified probability of being set, as opposed to random values weighted a given way. Bit density is specified as N/M, with M a (positive) power of 2, and N an integer between 0 and M.

The underlying generator can produce 1<<128 128-bit values, but the iterative process requires `log2(densityScale)` 128-bit values from the source per 128-bit output, so the top 7 bits of the address are used as an iteration counter. Thus, Weighted.Bit can produce 2^128 distinct values, but if the offset provided to Weighted.Bits is over 2^121, there will be overlap between the source values used for those bits, and source values used (for different iterations) for other offsets.

func NewWeighted

func NewWeighted(src Sequence) (*Weighted, error)

NewWeighted yields a new Weighted using the given sequence as a source of seekable pseudo-random bits.

func (*Weighted) Bit

func (w *Weighted) Bit(offset Uint128, density uint64, scale uint64) uint64

Bit returns the single 0-or-1 bit at the specified offset.

func (*Weighted) Bits

func (w *Weighted) Bits(offset Uint128, density uint64, scale uint64) (out Uint128)

Bits returns the 128-bit set of bits including offset. The column portion of offset is right-shifted by 7 to match the offset calculations in Bit(), above. Thus, you get the same values back for each sequence of 128 consecutive offsets.

func (*Weighted) NextBits

func (w *Weighted) NextBits(density, scale uint64) (out Uint128)

NextBits returns the next batch of bits after the last one retrieved.

type Zipf

type Zipf struct {
	// contains filtered or unexported fields
}

Zipf produces a series of values following a Zipf distribution. It is initialized with values q, v, and max, and produces values in the range [0,max) such that the probability of a value k is proportional to (v+k) ** -q. The input value v must be >= 1, and q must be > 1.

This is based on the same paper used for the golang stdlib Zipf distribution:

"Rejection-Inversion to Generate Variates from Monotone Discrete Distributions" W.Hormann, G.Derflinger [1996] http://eeyore.wu-wien.ac.at/papers/96-04-04.wh-der.ps.gz

This implementation differs from stdlib's in that it is seekable; you can get the Nth value in a theoretical series of results in constant time, without having to generate the whole series linearly, and that we used the names q and v rather than s and v.

func NewZipf

func NewZipf(q float64, v float64, max uint64, seed uint32, src Sequence) (z *Zipf, err error)

NewZipf returns a new Zipf object with the specified q, v, and max, and with its random source seeded in some way by seed. The sequence of values returned is consistent for a given set of inputs. The seed parameter can select one of multiple sub-sequences of the given sequence.

func (*Zipf) Next

func (z *Zipf) Next() uint64

Next returns the "next" value -- the one after the last one requested, or value 1 if none have been requested before.

func (*Zipf) Nth

func (z *Zipf) Nth(index uint64) uint64

Nth returns the Nth value from the sequence associated with the given Zipf. The value is fully determined by the input values (q, v, max, and seed) and the index.

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL