base62

package

v1.1.3 Latest Latest Go to latest Published: Aug 17, 2022 License: MIT, MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fufuok/basex

Links

Open Source Insights

README ¶

base62

base62 is a correctly implemented, compact and fast implementation of Base62 encoding/decoding algorithm. It is inspired by the java implementation by glowfall.

This Base62 implementation can encode/decode both integers and bytes of arbitrary length, the correctness is tested using a large number of randomly generated bytes.

It is much faster than big.Int based implementation and is not much slower than typical Base64 implementations. See the benchmark results below.

Why Base62

In comparison with Base64/Base32, Base62 is more friendly to human:

it contains only alpha-numerical symbols, no special characters
can be validated by eyes and more simple regexp
can be fully selected by mouse double-click in any text editors and browser address bar
it's compact and generates shorter strings than Base32
it's the most natural and unambiguous way to encode your data in human-readable form :)

Variations of Base62 algorithm cann be widely used to represent authentication data in printable and easy-copyable form, for example to encode OAuth 2.0 access_token data.

Usage

// Basic usage.
Encode(src []byte) []byte
EncodeToString(src []byte) string
Decode(src []byte) ([]byte, error)
DecodeString(src string) ([]byte, error)
FormatInt(num int64) []byte
FormatUint(num uint64) []byte
ParseInt(src []byte) (int64, error)
ParseUint(src []byte) (uint64, error)

// Providing a dst buffer, you may reuse buffers to reduce memory allocation.
EncodeToBuf(dst []byte, src []byte) []byte
DecodeToBuf(dst []byte, src []byte) ([]byte, error)
AppendInt(dst []byte, num int64) []byte
AppendUint(dst []byte, num uint64) []byte

// Or you may use a custom encoding alphabet.
enc := NewEncoding("...my-62-byte-string-alphabet...")
enc.XXX()

Benchmark

Benchmark_Encode-12                     11654754                97.41 ns/op
Benchmark_Decode-12                     15481666                73.60 ns/op
Benchmark_EncodeToString-12             11950086                99.54 ns/op
Benchmark_DecodeString-12               16301325                74.36 ns/op

Benchmark_EncodeToBuf-12                13855840                84.68 ns/op
Benchmark_DecodeToBuf-12                97695962                12.21 ns/op

Benchmark_EncodeInteger-12              29119437                41.30 ns/op
Benchmark_DecodeInteger-12             120328183                 9.917 ns/op

Benchmark_Encode_BigInt-12               1000000              1048 ns/op

Benchmark_Base64_EncodeToString-12      17803440                70.12 ns/op
Benchmark_Base64_DecodeString-12        19884616                55.09 ns/op

Benchmark_Base64_Encode-12              68163142                17.93 ns/op
Benchmark_Base64_Decode-12              41990004                28.25 ns/op

How it works

Encoding Base64 and Base32 both are bit-aligned, 6 bits for Base64 (2^6 = 64) and 5 bits for Base32 (2^5 = 32), but Base62 is not bit-aligned, it's inefficient to do divmod operation for non-bit-aligned integers, typical Base62 implementations are BigInt based, which encodes input data block by block to get better performance, (e.g. https://github.com/keybase/saltpack/blob/master/encoding/basex).

64 characters can fully represent 6 bits, but 62 characters can not, if each character represents 5 bits, that is the Base32 encoding.

A naive BigInt based algorithm gives the shortest result, which is roughly like this (e.g. https://github.com/eknkc/basex/blob/6baac8ea8b19cc66d125286d213770fec0691867/basex.go#L46):

digits := []int{0}
for i := 0; i < len(src); i++ {
    carry := int(src[i])

    for j := 0; j < len(digits); j++ {
        carry += digits[j] << 8
        digits[j] = carry % e.base
        carry = carry / e.base
    }

    for carry > 0 {
        digits = append(digits, carry%e.base)
        carry = carry / e.base
    }
}
// map to encoding alphabet
_ = digits

This works but the time complexity is O(n^2) where n is the length of src. If the input can be very large, the cost is unacceptable.

Improved algorithm splits the input into blocks, which reduce the time complexity to O(kn) where k is the block size, and n is the length of src, it generates slightly longer result than naive BigInt algorithm, but it is worth for the reduced time complexity. When k is n, it degrades to the naive algorithm, when k is 1, the output length is twice of the input, we don't want that. 32 is chosen as the block size in library saltpack/encoding/basex.

Inspired by the java implementation by glowfall, this library is not BigInt based, it encodes and decodes any arbitrary bytes in O(n) time complexity. Here is a brief description of the algorithm.

This library uses a variadic length encoding, for each 6 bits, if the value is in range [0, 62), it can be directly map to the 62 characters, if the value is 62 or 63 which exceeds 61, we turn to encoding just the lower 5 bits. If we find a way to recognize the 5 bits pattern, then we can correctly decode it back to the source data.

The binary representation of 62 and 63 is:

62 - 0b_0011_1110
63 - 0b_0011_1111

They have a common mask 0b_0011_1110, in range [0, 62), there are another two integers have a similar mask 0b_0001_1110, 30 and 31, which is:

30 - 0b_0001_1110
31 - 0b_0001_1111

The four integers 30, 31, 62, 63 share a common mask 0b_0001_1110, while all other integers in range [0, 64) don't share the mask, i.e. for all other integers, the expression value & 0b_0001_1110 == 0b_0001_1110 evaluates to false.

This is the key point!

We define a compactMask as 0b_0001_1110.

When encoding, for each 6 bits integer x, if x & compactMask == compactMask is true, it must be one of 30, 31, 62 or 63, we just encode the lower 5 bits, which are 0b_0001_1110 (30) and 0b_0001_1111 (31), we leave the 6-th bit to next byte.

When decoding, for each encoded byte x, we check x & compactMask == compactMask, when it is false, we know that the byte represents 6 bits in the raw data, it is in range [0, 30) or [32, 62), else it represents 5 bits in the raw data, it is 30 or 31.

That is it, by using variadic length encoding, we successfully limit the value range to [0, 62), we get very compact result and a simple O(n) time complexity for encoding and decoding data of arbitrary length.

Compatibility

This library guarantees that it can correctly decode data encoded by itself.

The encoded result is not compatible with BigInt based algorithm, (e.g. saltpack/encoding/basex, GMP and GnuPG).

The algorithm may be ported to other languages in future (if someone does a porting, I'm glad to link it here :-) ).

Changelog

v1.1.0 - 2022/1/3

Refactor the encoding code to be simpler and cleaner, as a bonus, it gives better performance which is 1.5X~ faster than the old.
Add a brief description of the algorithm and state the compatibility with other Base62 implementations.

v1.0.0 - 2021/10/23

First stable release, the package has been used in several small and medium projects.

This release adds new APIs which help to reuse buffers to reduce memory allocation.

Documentation ¶

Index ¶

Variables
func AppendInt(dst []byte, num int64) []byte
func AppendUint(dst []byte, num uint64) []byte
func Decode(src []byte) ([]byte, error)
func DecodeString(src string) ([]byte, error)
func DecodeToBuf(dst []byte, src []byte) ([]byte, error)
func Encode(src []byte) []byte
func EncodeToBuf(dst []byte, src []byte) []byte
func EncodeToString(src []byte) string
func FormatInt(num int64) []byte
func FormatUint(num uint64) []byte
func ParseInt(src []byte) (int64, error)
func ParseUint(src []byte) (uint64, error)
type CorruptInputError
- func (e CorruptInputError) Error() string
type Encoding
- func NewEncoding(encoder string) *Encoding

Constants ¶

This section is empty.

Variables ¶

View Source

var StdEncoding = NewEncoding(encodeStd)

StdEncoding is the default base62 encoding using alphabet [A-Za-z0-9].

Functions ¶

func AppendInt ¶

func AppendInt(dst []byte, num int64) []byte

AppendInt appends the base62 representation of the integer num using StdEncoding, to dst and returns the extended buffer.

func AppendUint ¶

func AppendUint(dst []byte, num uint64) []byte

AppendUint appends the base62 representation of the unsigned integer num using StdEncoding, to dst and returns the extended buffer.

func Decode ¶

func Decode(src []byte) ([]byte, error)

Decode decodes src using StdEncoding, returns the decoded bytes.

If src contains invalid base62 data, it will return nil and CorruptInputError.

func DecodeString ¶

func DecodeString(src string) ([]byte, error)

DecodeString returns the bytes represented by the base62 string src using StdEncoding.

func DecodeToBuf ¶

func DecodeToBuf(dst []byte, src []byte) ([]byte, error)

DecodeToBuf decodes src using StdEncoding, appending the decoded bytes to dst. If dst has not enough capacity, it copies dst and returns the extended buffer.

If src contains invalid base62 data, it will return nil and CorruptInputError.

func Encode ¶

func Encode(src []byte) []byte

Encode encodes src using StdEncoding, returns the encoded bytes.

func EncodeToBuf ¶

func EncodeToBuf(dst []byte, src []byte) []byte

EncodeToBuf encodes src using StdEncoding, appending the encoded bytes to dst. If dst has not enough capacity, it copies dst and returns the extended buffer.

func EncodeToString ¶

func EncodeToString(src []byte) string

EncodeToString returns a base62 string representation of src using StdEncoding.

func FormatInt ¶

func FormatInt(num int64) []byte

FormatInt encodes an integer num to base62 using StdEncoding.

func FormatUint ¶

func FormatUint(num uint64) []byte

FormatUint encodes an unsigned integer num to base62 using StdEncoding.

func ParseInt ¶

func ParseInt(src []byte) (int64, error)

ParseInt returns an integer from its base62 representation using StdEncoding.

If src contains invalid base62 data, it returns 0 and CorruptInputError.

func ParseUint ¶

func ParseUint(src []byte) (uint64, error)

ParseUint returns an unsigned integer from its base62 representation using StdEncoding.

If src contains invalid base62 data, it returns 0 and CorruptInputError.

Types ¶

type CorruptInputError ¶

type CorruptInputError int64

func (CorruptInputError) Error ¶

func (e CorruptInputError) Error() string

type Encoding ¶

type Encoding struct {
	// contains filtered or unexported fields
}

An Encoding is a radix 62 encoding/decoding scheme, defined by a 62-character alphabet.

func NewEncoding ¶

func NewEncoding(encoder string) *Encoding

NewEncoding returns a new Encoding defined by the given alphabet, which must be a 62-byte string that does not contain CR / LF ('\r', '\n').

func (*Encoding) AppendInt ¶

func (enc *Encoding) AppendInt(dst []byte, num int64) []byte

AppendInt appends the base62 representation of the integer num, as generated by FormatInt, to dst and returns the extended buffer.

func (*Encoding) AppendUint ¶

func (enc *Encoding) AppendUint(dst []byte, num uint64) []byte

AppendUint appends the base62 representation of the unsigned integer num, as generated by FormatUint, to dst and returns the extended buffer.

func (*Encoding) Decode ¶

func (enc *Encoding) Decode(src []byte) ([]byte, error)

Decode decodes src using the encoding enc, returns the decoded bytes.

If src contains invalid base62 data, it will return nil and CorruptInputError.

func (*Encoding) DecodeString ¶

func (enc *Encoding) DecodeString(src string) ([]byte, error)

DecodeString returns the bytes represented by the base62 string src.

func (*Encoding) DecodeToBuf ¶

func (enc *Encoding) DecodeToBuf(dst []byte, src []byte) ([]byte, error)

DecodeToBuf decodes src using the encoding enc, appending the decoded bytes to dst. If dst has not enough capacity, it copies dst and returns the extended buffer.

If src contains invalid base62 data, it will return nil and CorruptInputError.

func (*Encoding) Encode ¶

func (enc *Encoding) Encode(src []byte) []byte

Encode encodes src using the encoding enc, returns the encoded bytes.

func (*Encoding) EncodeToBuf ¶

func (enc *Encoding) EncodeToBuf(dst []byte, src []byte) []byte

EncodeToBuf encodes src using the encoding enc, appending the encoded bytes to dst. If dst has not enough capacity, it copies dst and returns the extended buffer.

func (*Encoding) EncodeToString ¶

func (enc *Encoding) EncodeToString(src []byte) string

EncodeToString returns a base62 string representation of src.

func (*Encoding) FormatInt ¶

func (enc *Encoding) FormatInt(num int64) []byte

FormatInt encodes an integer num to base62 using the encoding enc.

func (*Encoding) FormatUint ¶

func (enc *Encoding) FormatUint(num uint64) []byte

FormatUint encodes an unsigned integer num to base62 using the encoding enc.

func (*Encoding) ParseInt ¶

func (enc *Encoding) ParseInt(src []byte) (int64, error)

ParseInt returns an integer from its base62 representation.

If src contains invalid base62 data, it returns 0 and CorruptInputError.

func (*Encoding) ParseUint ¶

func (enc *Encoding) ParseUint(src []byte) (uint64, error)

ParseUint returns an unsigned integer from its base62 representation.

If src contains invalid base62 data, it returns 0 and CorruptInputError.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL