bytemap

package module
v0.23.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2023 License: MIT Imports: 5 Imported by: 1

README

bytemap GoDoc Go Report Card Gocover.io

Bytemap contains types for making maps from bytes to bool, integer, or float using a backing array.

Benchmarks

Micro-benchmarks are usually not a good way to evaluate systems. That said, using a bytemap array can be very fast while also providing readable code.

Let's say you want to test that string contains only digits. A very fast way is just to write a loop:

match := true
for _, c := range []byte(s) {
    if c < '0' || c > '9' {
        match = false
        break
    }
}

This is very fast, but the code is somewhat tedious. One might decide to replace it with a simple regular expression.

r := regexp.MustCompile(`^[0-9]*$`)
match := r.MatchString(s)

This is much shorter, but it's actually a little tricky to read if you're not very familiar with regular expressions, and it's much slower to execute.

Another idea might be to test against a map[byte]bool. This turns out to be almost as slow as the regular expression and about as verbose as the simple loop test.

A bytemap is short, simple, and fast:

m := bytemap.Make("0123456789")
match := m.Contains(s)

Take these benchmarks with a grain of salt, but they show a bytemap can actually perform as well as a handwritten loop:

goos: darwin
goarch: amd64
pkg: github.com/carlmjohnson/bytemap
BenchmarkLoop-8                  184966533       6.314 ns/op
BenchmarkBoolContains-8          162605607       7.503 ns/op
BenchmarkBitFieldContains-8       80200012      16.85 ns/op
BenchmarkMapByteEmpty-8           53849732      23.19 ns/op
BenchmarkMapByteBool-8             6663114     165.9 ns/op
BenchmarkRegexp-8                  6080572     197.5 ns/op
BenchmarkRegexpSlow-8               330384    3252 ns/op

How does it work?

There are only 256 different possible bit patterns in a byte, so bytemap.Bool just preallocates an array of 256 entries.

bytemap.BitField only allocates one bit per entry, which makes it 8 times smaller than bytemap.Bool, only 32 bytes long. In many cases however, it will be a bit slower than using a bytemap.Bool.

Documentation

Overview

Package bytemap contains types for making maps from bytes to bool, integer, or float. The maps are backed by arrays of 256 entries.

Index

Examples

Constants

View Source
const BitFieldLen = Len / 8

BitFieldLen is the length of a BitField array

View Source
const Len = 1 << 8

Len is the length of a byte map array, 256 items.

Variables

This section is empty.

Functions

This section is empty.

Types

type BitField

type BitField [BitFieldLen]byte

BitField is a map from byte to bool backed by a bit field. It is not as fast as Bool, but if memory is an important consideration, it is 8 times smaller.

func (*BitField) Clone

func (m *BitField) Clone() *BitField

Clone copies m.

func (*BitField) Contains

func (m *BitField) Contains(s string) bool

Contains reports whether all bytes in s are already in m.

func (*BitField) ContainsBytes

func (m *BitField) ContainsBytes(b []byte) bool

ContainsBytes reports whether all bytes in b are already in m.

func (*BitField) ContainsReader

func (m *BitField) ContainsReader(r io.Reader) (bool, error)

ContainsReader reports whether all bytes in r are already in m. If the reader fails, it returns false, error. If it reads to io.EOF, it returns true, nil.

func (*BitField) Equals

func (m *BitField) Equals(other *BitField) bool

Equals reports if two BitFields are equal.

func (*BitField) Get

func (m *BitField) Get(key byte) bool

Get looks up one byte in the BitField byte map.

func (*BitField) Set

func (m *BitField) Set(key byte, value bool)

Set sets one byte in the Bitfield byte map

func (*BitField) ToBool added in v0.23.3

func (m *BitField) ToBool() *Bool

func (*BitField) ToMap

func (m *BitField) ToMap() map[byte]bool

ToMap makes a map[byte]bool from the bytemap.

func (*BitField) Write

func (m *BitField) Write(p []byte) (int, error)

Write satisfies io.Writer.

func (*BitField) WriteString

func (m *BitField) WriteString(s string) (n int, err error)

WriteString satisfies io.StringWriter.

type Bool

type Bool [Len]bool

Bool is an array backed map from byte to bool.

func Difference added in v0.23.3

func Difference(m1, m2 *Bool) *Bool

Difference constructs a new Bool containing the members of m1 that are not in m2.

func Intersection added in v0.23.3

func Intersection(m1, m2 *Bool) *Bool

Intersection constructs a new Bool containing the intersection of m1 and m2.

func Make

func Make[byteseq []byte | string](seq byteseq) *Bool

Make initializes a bytemap.Bool with a byte sequence.

func Range added in v0.23.3

func Range(start, end byte) *Bool

Range creates a bytemap.Bool where in the inclusive range of characters have been set. If end is less than start, it panics.

Example
package main

import (
	"fmt"

	"github.com/carlmjohnson/bytemap"
)

func main() {
	ascii := bytemap.Range(0, 127)
	fmt.Println(ascii.Contains("Hello, world"))
	fmt.Println(ascii.Contains("Hello, 🌎"))

	upper := bytemap.Range('A', 'Z')
	nonupper := upper.Invert()
	fmt.Println(nonupper.Contains("hello, world!"))

}
Output:

true
false
true

func Union added in v0.23.3

func Union(ms ...*Bool) *Bool

Union constructs a new Bool containing the union of the Bool bytemaps.

Example
package main

import (
	"fmt"

	"github.com/carlmjohnson/bytemap"
)

func main() {
	upper := bytemap.Range('A', 'Z')
	lower := bytemap.Range('a', 'z')
	alpha := bytemap.Union(upper, lower)
	word := bytemap.Union(
		upper,
		lower,
		bytemap.Range('0', '9'),
		bytemap.Make("_"),
	)
	fmt.Println(alpha.Contains("CamelCase"))
	fmt.Println(alpha.Contains("snake_case"))
	fmt.Println(word.Contains("snake_case"))
}
Output:

true
false
true

func (*Bool) Clone

func (m *Bool) Clone() *Bool

Clone copies m.

func (*Bool) Contains

func (m *Bool) Contains(s string) bool

Contains reports whether all bytes in s are already in m.

func (*Bool) ContainsBytes

func (m *Bool) ContainsBytes(b []byte) bool

ContainsBytes reports whether all bytes in b are already in m.

func (*Bool) ContainsReader

func (m *Bool) ContainsReader(r io.Reader) (bool, error)

ContainsReader reports whether all bytes in r are already in m. If the reader fails, it returns false, error. If it reads to io.EOF, it returns true, nil.

func (*Bool) Equals

func (m *Bool) Equals(other *Bool) bool

Equals reports if two Bools are equal.

func (*Bool) Get

func (m *Bool) Get(key byte) bool

Get looks up one byte in the Bool byte map.

func (*Bool) Invert added in v0.23.3

func (m *Bool) Invert() *Bool

Invert returns a copy m with all values inverted.

func (*Bool) Set

func (m *Bool) Set(key byte, value bool)

Set sets one byte in the Bool byte map.

func (*Bool) String added in v0.23.2

func (m *Bool) String() string

func (*Bool) ToBitField added in v0.23.3

func (m *Bool) ToBitField() *BitField

ToBitField returns a BitField equivalent to m.

func (*Bool) ToMap

func (m *Bool) ToMap() map[byte]bool

ToMap makes a map[byte]bool from the bytemap.

func (*Bool) Write

func (m *Bool) Write(p []byte) (int, error)

Write satisfies io.Writer.

func (*Bool) WriteString

func (m *Bool) WriteString(s string) (n int, err error)

WriteString satisfies io.StringWriter.

type Float

type Float [Len]float64

Float is an array backed map from byte to float64.

func (*Float) Clone

func (m *Float) Clone() *Float

Clone copies m.

func (*Float) Contains

func (m *Float) Contains(s string) bool

Contains reports whether all bytes in s are already in m.

func (*Float) ContainsBytes

func (m *Float) ContainsBytes(b []byte) bool

ContainsBytes reports whether all bytes in b are already in m.

func (*Float) ContainsReader

func (m *Float) ContainsReader(r io.Reader) (bool, error)

ContainsReader reports whether all bytes in r are already in m. If the reader fails, it returns false, error. If it reads to io.EOF, it returns true, nil.

func (*Float) Equals

func (m *Float) Equals(other *Float) bool

Equals reports if two Floats are equal.

func (*Float) Get

func (m *Float) Get(key byte) float64

Get looks up one byte in the Float byte map.

func (*Float) MostCommon

func (m *Float) MostCommon() []FloatN

MostCommon returns a slice of character counts for m from highest count to lowest.

func (*Float) Set

func (m *Float) Set(key byte, value float64)

Set sets one byte in the Float byte map.

func (*Float) SetFrequencies

func (m *Float) SetFrequencies()

SetFrequencies sets each value in m to its overall frequency from 0 to 1.

Example
package main

import (
	"fmt"
	"io"
	"os"

	"github.com/carlmjohnson/bytemap"
)

func main() {
	f, err := os.Open("testdata/moby-dick.txt")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	var freqmap bytemap.Float
	_, _ = io.Copy(&freqmap, f)
	freqmap.SetFrequencies()
	for _, freq := range freqmap.MostCommon() {
		if freq.N > 0.02 {
			fmt.Printf("%q: %04.1f%%\n", []byte{freq.Byte}, freq.N*100)
		}
	}
}
Output:

" ": 15.5%
"e": 09.3%
"t": 06.9%
"a": 06.0%
"o": 05.5%
"n": 05.2%
"i": 05.0%
"s": 05.0%
"h": 04.9%
"r": 04.1%
"l": 03.3%
"d": 03.0%
"u": 02.1%

func (*Float) ToBool

func (m *Float) ToBool() *Bool

ToBool makes a Bool from the bytemap.

func (*Float) ToMap

func (m *Float) ToMap() map[byte]float64

ToMap makes a map[byte]float64 from the bytemap.

func (*Float) Write

func (m *Float) Write(p []byte) (int, error)

Write satisfies io.Writer.

func (*Float) WriteString

func (m *Float) WriteString(s string) (n int, err error)

WriteString satisfies io.StringWriter.

type FloatN

type FloatN struct {
	Byte byte
	N    float64
}

type Int

type Int [Len]int

Int is an array backed map from byte to integer.

func (*Int) Clone

func (m *Int) Clone() *Int

Clone copies m.

func (*Int) Contains

func (m *Int) Contains(s string) bool

Contains reports whether all bytes in s are already in m.

func (*Int) ContainsBytes

func (m *Int) ContainsBytes(b []byte) bool

ContainsBytes reports whether all bytes in b are already in m.

func (*Int) ContainsReader

func (m *Int) ContainsReader(r io.Reader) (bool, error)

ContainsReader reports whether all bytes in r are already in m. If the reader fails, it returns false, error. If it reads to io.EOF, it returns true, nil.

func (*Int) Equals

func (m *Int) Equals(other *Int) bool

Equals reports if two Ints are equal.

func (*Int) Get

func (m *Int) Get(key byte) int

Get looks up one byte in the Int byte map.

func (*Int) MostCommon

func (m *Int) MostCommon() []IntN

MostCommon returns a slice of character counts for m from highest count to lowest.

Example
package main

import (
	"fmt"
	"io"
	"strings"

	"github.com/carlmjohnson/bytemap"
)

func main() {
	var freqmap bytemap.Int
	r := strings.NewReader(`The quick brown fox jumps over the lazy dog.`)
	_, _ = io.Copy(&freqmap, r)
	for _, freq := range freqmap.MostCommon() {
		if freq.N > 0 {
			fmt.Printf("%q: %d\n", []byte{freq.Byte}, freq.N)
		}
	}
}
Output:

" ": 8
"o": 4
"e": 3
"h": 2
"r": 2
"u": 2
".": 1
"T": 1
"a": 1
"b": 1
"c": 1
"d": 1
"f": 1
"g": 1
"i": 1
"j": 1
"k": 1
"l": 1
"m": 1
"n": 1
"p": 1
"q": 1
"s": 1
"t": 1
"v": 1
"w": 1
"x": 1
"y": 1
"z": 1

func (*Int) Set

func (m *Int) Set(key byte, value int)

Set sets one byte in the Int byte map.

func (*Int) ToBool

func (m *Int) ToBool() *Bool

ToBool makes a Bool from the bytemap.

func (*Int) ToFloat

func (m *Int) ToFloat() *Float

ToFloat makes a Float from the bytemap.

func (*Int) ToMap

func (m *Int) ToMap() map[byte]int

ToMap makes a map[byte]int from the bytemap.

func (*Int) Write

func (m *Int) Write(p []byte) (int, error)

Write satisfies io.Writer.

func (*Int) WriteString

func (m *Int) WriteString(s string) (n int, err error)

WriteString satisfies io.StringWriter.

type IntN

type IntN struct {
	Byte byte
	N    int
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL