ordered

package module
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2024 License: BSD-3-Clause Imports: 6 Imported by: 1

README

Ordered defines an encoding of a sequence of basic typed values into []byte and back. The encoding is “ordered” in the sense that bytes.Compare on the byte slices matches cmp.Compare on the original values. The order-preserving encoding is particularly useful for preparing keys in an ordered key/value storage such as Bigtable, Bolt, CockroachDB, LevelDB, Pebble, RocksDB, or Spanner.

This package is inspired by github.com/google/orderedcode but uses a different, self-describing encoding.

See the package documentation for details and examples.

Documentation

Overview

Package ordered defines an encoding of a sequence of basic typed values into []byte and back.

Encode converts a list of values to a []byte; Append is like Encode but appends to an existing buffer.

Decode converts a []byte back to a list of typed values. DecodePrefix is like Decode but decodes only a prefix of the encoded data.

Define the notation E([list]) to mean the encoding of a list of values The encoding has a few important properties:

  • Typed: The types of the values are part of the encoding, so decoding can check that the expected types match the actual types, and decoding can proceed without knowing the types at all.

  • Unique: There is only one possible encoding for a given list of values.

  • Associative: The encoding of a list of values is the concatenation of the encodings of individual values. E([A₁, ..., Aₙ]) = E([A₁]) + ... + E([Aₙ]).

  • Prefix-Preserving: The associative property implies that any encoding of a list of values is ordered less than any longer list sharing the same prefix: E([A₁, ..., Aₘ]) < E([A₁, ..., Aₘ, ..., Aₙ]).

  • Ordered: for any two values A, B of the same type, bytes.Compare(E([A]), E([B])) == cmp.Compare(A, B).

    The associative property implies an extension of the ordered property: for any two lists of values [A₁, ..., Aₙ] and [B₁, ..., Bₙ], where Aᵢ and Bᵢ have the same type for all i, bytes.Compare(E([A₁, ..., Aₙ]), E([B₁, ..., Bₙ])) returns the same result as comparing the lists with cmp.Compare(A₁, B₁), cmp.Compare(A₂, B₂), ..., cmp.Compare(Aₙ, Bₙ), stopping at the first non-zero result.

These properties are particularly useful for preparing keys in an ordered key/value storage such as Bigtable, Bolt, CockroachDB, LevelDB, Pebble, RocksDB, or Spanner.

The encoding is also useful when a basic, efficient, typed serialization is needed.

Infinity

The value Inf of type Infinity has an encoding greater than any non-infinite value's encoding.

The main use for infinity is to provide an upper bound for range scans considering a certain prefix. E([prefix]) < E([prefix, x]) < E([prefix, Inf]) for all x, so E([prefix]) and E([prefix, Inf]) can be used as the scan bounds.

There is no explicit negative infinity value; E[] = "" can be used instead, as was done implicitly in the previous example. Alternately, the reversed infinity Rev(Inf) encodes as "\x00" and can also be used as a negative infinity.

The type StringOrInfinity holds either a string or an infinity.

Reverse Order

It can be necessary to order encodings in the reverse order of the values, so that bytes.Compare orders values from greatest to least. The type Reverse[T] provides that ordering. For x of type T, Rev(x) returns a Reverse[T] holding the value x.

Given x and y of type T, bytes.Compare(E([Rev(x)]), E([Rev(y)])) = cmp.Compare(y, x).

Raw Data

It can be useful and efficient to store a large []byte as itself at the end of an encoding. The type Raw represents such a sequence. It is encoded with a length prefix, so that shorter values are ordered before longer ones, and values of the same length are ordered to match bytes.Compare. When decoding, the extracted values of type []byte are copies of the original slice, not pointers into the original. Raw is an exception: decoding a Raw returns a slice overlapping the original encoded input.

Encodings

This package only supports encoding a limited number of types. The types and their encodings are:

  • string and []byte: The encoding is a 0x01 byte followed by the string data and then a 0x00 0x00 end-of-string marker. In the string data, any 0x00 byte is replaced with 0x00 0xFF, to avoid ambiguity with the marker.

    For example, the encoding of "hello\x00world" is "\x01hello\x00\xFFworld\x00\x00", and the encoding of "" is "\x01\x00\x00".

  • int, uint, int8, uint8, ..., int64, uint64, uintptr: The encoding of a non-negative integer x is a byte 0x30+(n-1) for n in 1..8 where n is the number of significant low bytes of x. That byte is followed by the low n bytes of x, most significant to least significant.

    For example, the encoding of 0x12345 is "\x32\x01\x23\x45", the encoding of 1 is "\x30\x01", and the encoding of 0 is "\x31\x00".

    The encoding of a negative integer x is a byte 0x30-n for n in 1..8 where n is the number of significant low bytes of ^x = -(x+1). That byte is followed by the low n bytes of x, most significant to least significant.

    For example, the encoding of -0x12345 is "\x2D\xFE\xDC\xBB", and the encoding of -1 is "\x2F\xFF".

    Because all integer types share a single encoding, it is valid to encode using one type and decode using a different type, provided the values encoded are in range for the decoding type, and the specific integer type does not play a role in the overall ordering of encodings.

  • float32: The encoding of a float32 f is a 0x02 byte followed by a big-endian uint32 u representing f. If f is a NaN, then u = 0. Otherwise, u is math.Float32bits(f) with the top (sign) bit always inverted and the remaining bits inverted when f < 0.

  • float64: The encoding of a float32 f is a 0x03 byte followed by a big-endian uint64 u representing f. If f is a NaN, then u = 0. Otherwise, u is math.Float64bits(f) with the top (sign) bit always inverted and the remaining bits inverted when f < 0.

    Although the float32 and float64 encodings are similar, they are not the same, so float32 and float64 values cannot be interchanged between encode and decode, and all float32 values order before all float64 values.

  • Infinity: Inf encodes as the byte 0xFF.

  • StringOrInfinity: The encoding is either the encoding of the string or the encoding of infinity, depending on which the value is.

  • Reverse[T]: The encoding of Rev(x) is the encoding of x with all bits negated (equivalently, all bytes XOR'ed with 0xFF).

  • Raw: The encoding is a 0x04 byte followed by the raw bytes. The content is the remainder of the encoded data.

Type Ordering

Code should in general not depend on the relative ordering of values of different types, but the order is: Reverse[Infinity] < string/[]byte < float32 < float64 < Raw < integer < Reverse[integer] < Reverse[float64] < Reverse[float32] < Reverse[string/[]byte] < Infinity.

Compatibility

Because the encodings are expected to be used as keys and values in storage systems, existing encodings will not changed in future versions. New types may be introduced.

Although this package is inspired by github.com/google/orderedcode, this package uses a different, typed (self-describing) encoding.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Append

func Append(enc []byte, list ...any) []byte

Append returns the result of appending the encodings of list to the existing encoding enc. If CanEncode(list...) is not true, then Append panics.

(Note that CanEncode is a property only of the types of the values in list, not their values. At any call site passing a list of valid concrete types, Append can be assumed not to panic.)

func CanEncode

func CanEncode(list ...any) bool

CanEncode reports whether the values x in the list can be encoded. A value can be encoded if its type is []byte, string, Infinity, StringOrInfinity, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, uintptr, float32, float64, or if its type is Reverse[T] for any type T in the preceding list.

func Decode

func Decode(enc []byte, list ...any) error

Decode decodes enc into a list of typed values.

Each value in list must be a non-nil pointer to an encodable type (see CanEncode) or else be a non-nil *any or else be a nil any. When decoding into a *any, the existing value in the any is ignored and replaced with one of the types listed in the documentation for DecodeAny. Decoding into a nil any discards the corresponding value.

Decode checks that there is no encoded data remaining after decoding values into each element of list. To decode only a prefix of the encoding, use DecodePrefix.

func DecodeAny

func DecodeAny(enc []byte) ([]any, error)

DecodeAny decodes enc into a []any. Each element in the returned slice will have one of the following types:

  • string, for both string and []byte values
  • Infinity
  • int64, for integer values representable by int64
  • uint64, for other very large integer values
  • float32
  • float64
  • Reverse[T] for a type earlier in this list
  • Raw for a raw byte slice.

func DecodeFmt added in v1.1.0

func DecodeFmt(enc []byte) (string, error)

DecodeFmt decodes enc into a parenthesized, comma-separated list of textual values. Each value in the list will have one of the following formats:

  • double-quoted string, for both string and []byte values
  • "Inf", for Infinity
  • decimal or negative decimal, for integer values.
  • "float32(f)" for float32 values; f may be NaN or +Inf or -Inf
  • "float64(f)" for float64 values; f may be NaN or +Inf or -Inf
  • "Rev(x)" for a reverse of formatted value x earlier in this list
  • "Raw(q)", where q is a double-quoted string, for a Raw value.

func DecodePrefix

func DecodePrefix(enc []byte, list ...any) (rest []byte, err error)

DecodePrefix decodes a prefix of enc into a list of typed values, returning the rest of the encoding and any error encountered.

Each value in list must be a non-nil pointer to an encodable type (see CanEncode) or else be a non-nil *any or else be a nil any. When decoding into a *any, the existing value in the any is ignored and replaced with one of the types listed in the documentation for DecodeAny. Decoding into a nil any discards the corresponding value.

func Encode

func Encode(list ...any) []byte

Encode returns the encoding of list. If CanEncode(list...) is not true, then Encode panics.

(Note that CanEncode is a property only of the types of the values in list, not their values. At any call site passing a list of valid concrete types, Encode can be assumed not to panic.)

func RevAny

func RevAny(x any) any

RevAny is the untyped version of Rev. It panics if x is not a Reversible type.

Types

type Infinity

type Infinity struct{}

Infinity is the infinity type. Inf is its only value.

var Inf Infinity

Inf is the only value of type Infinity.

type Raw added in v1.1.0

type Raw []byte

Raw is a []byte that encodes as itself, with only a type prefix added. When decoding into a Raw, the Raw is a subslice of the original encoded input, not a copy.

type Reverse

type Reverse[T Reversible] struct {
	// contains filtered or unexported fields
}

Reverse[T] holds a single value of type T, accessible using the [Reverse.value] method. Rev and RevAny make a Reverse[T].

For two values x < y, Encode(x) < Encode(y) but Encode(Rev(x)) > Encode(Rev(y)).

func Rev

func Rev[T Reversible](x T) Reverse[T]

Rev returns the reversed value representing x.

func (Reverse[T]) Value

func (r Reverse[T]) Value() T

Value returns the value x for which r == Rev(x).

type Reversible

type Reversible interface {
	string | []byte |
		int8 | int16 | int32 | int64 | int |
		uint8 | uint16 | uint32 | uint64 | uint | uintptr |
		float32 | float64 |
		Infinity |
		StringOrInfinity
}

Reversible is the type constraint for types that can be used with Reverse.

type StringOrInfinity

type StringOrInfinity struct {
	S   string
	Inf bool
}

A StringOrInfinity represents either a string or an infinity. If Inf is true, it represents infinity, and S is ignored. Otherwise, it represents the string S.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL