Documentation
¶
Overview ¶
Package simd provides access to SIMD-based implementations of several common operations on byte arrays which the compiler cannot be trusted to autovectorize within the next several years.
The backend assumes SSE4.2 is available: init() checks for SSE4.2 support, and panics when it isn't there. The interface is designed to allow the backend to also autodetect e.g. AVX2/AVX-512 and opportunistically use those instructions, without any changes to properly written higher-level code. Implementation of the AVX2 part of this is in progress.
The central constraint driving this package's design is the standard Go compiler's inability to inline short assembly functions; see
https://groups.google.com/forum/#!topic/golang-nuts/yVOfeHYCIT4 https://github.com/golang/go/issues/4947#issuecomment-66075571
for more details. As a consequence, it is critical to push looping logic down to the assembly level as well, otherwise function call overhead is overwhelming. Conversely, due to the much higher development burden of this type of code, we don't go beyond that point; this package is essentially a collection of for loops.
Two classes of functions are exported:
- Functions with 'Unsafe' in their names are very performant, but are memory-unsafe, do not validate documented preconditions, and may have the unusual property of reading/writing to a few bytes *past* the end of the given slices. The MakeUnsafe() function and its relatives allocate byte-slices with sufficient extra capacity for all Unsafe functions with the latter property to work properly.
- Their safe analogues work properly on ordinary slices, and often panic when documented preconditions are not met. When a precondition is not explicitly checked (due to computational cost), safe functions may return garbage values when the condition is not met, but they are memory-safe: they will not corrupt unrelated memory or perform out-of-bounds read operations.
Index ¶
- Constants
- func Accumulate8(src []byte) int
- func Accumulate8Greater(src []byte, val byte) int
- func AddConst8(dst, src []byte, val byte)
- func AddConst8Inplace(main []byte, val byte)
- func AddConst8Unsafe(dst, src []byte, val byte)
- func AddConst8UnsafeInplace(main []byte, val byte)
- func And(dst, src1, src2 []byte)
- func AndConst8(dst, src []byte, val byte)
- func AndConst8Inplace(main []byte, val byte)
- func AndConst8Unsafe(dst, src []byte, val byte)
- func AndConst8UnsafeInplace(main []byte, val byte)
- func AndInplace(main, arg []byte)
- func AndUnsafe(dst, src1, src2 []byte)
- func AndUnsafeInplace(main, arg []byte)
- func BitFromEveryByte(dst, src []byte, bitIdx int)
- func BytesPerVec() int
- func Count2Bytes(src []byte, val1, val2 byte) int
- func Count3Bytes(src []byte, val1, val2, val3 byte) int
- func CountNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int
- func CountNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)
- func CountUnpackedNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int
- func CountUnpackedNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)
- func DivUpPow2(dividend, divisor int, log2Divisor uint) int
- func FindNaNOrInf64(data []float64) int
- func FirstGreater8(arg []byte, val byte, startPos int) int
- func FirstGreater8Unsafe(arg []byte, val byte, startPos int) int
- func FirstLeq8(arg []byte, val byte, startPos int) int
- func FirstLeq8Unsafe(arg []byte, val byte, startPos int) int
- func FirstUnequal8(arg1, arg2 []byte, startPos int) int
- func FirstUnequal8Unsafe(arg1, arg2 []byte, startPos int) int
- func IndexU16(main []uint16, val uint16) int
- func Interleave8(dst, even, odd []byte)
- func Interleave8Unsafe(dst, even, odd []byte)
- func Invmask(dst, src1, src2 []byte)
- func InvmaskConst8(dst, src []byte, val byte)
- func InvmaskConst8Inplace(main []byte, val byte)
- func InvmaskConst8Unsafe(dst, src []byte, val byte)
- func InvmaskConst8UnsafeInplace(main []byte, val byte)
- func InvmaskInplace(main, arg []byte)
- func InvmaskUnsafe(dst, src1, src2 []byte)
- func InvmaskUnsafeInplace(main, arg []byte)
- func MakeUnsafe(len int) []byte
- func MaskThenCountByte(src []byte, mask, val byte) int
- func Memset16Raw(dst, valPtr unsafe.Pointer, nElem int)
- func Memset32Raw(dst, valPtr unsafe.Pointer, nElem int)
- func Memset8(dst []byte, val byte)
- func Memset8Unsafe(dst []byte, val byte)
- func Or(dst, src1, src2 []byte)
- func OrConst8(dst, src []byte, val byte)
- func OrConst8Inplace(main []byte, val byte)
- func OrConst8Unsafe(dst, src []byte, val byte)
- func OrConst8UnsafeInplace(main []byte, val byte)
- func OrInplace(main, arg []byte)
- func OrUnsafe(dst, src1, src2 []byte)
- func OrUnsafeInplace(main, arg []byte)
- func PackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)
- func PackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
- func Popcnt(bytes []byte) int
- func PopcntUnsafe(bytes []byte) int
- func RemakeUnsafe(bufptr *[]byte, len int)
- func RepeatI16(dst []int16, val int16)
- func RepeatU16(dst []uint16, val uint16)
- func ResizeUnsafe(bufptr *[]byte, len int)
- func Reverse16InplaceRaw(main unsafe.Pointer, nElem int)
- func Reverse16Raw(dst, src unsafe.Pointer, nElem int)
- func Reverse8(dst, src []byte)
- func Reverse8Inplace(main []byte)
- func Reverse8Unsafe(dst, src []byte)
- func ReverseI16(dst, src []int16)
- func ReverseI16Inplace(main []int16)
- func ReverseU16(dst, src []uint16)
- func ReverseU16Inplace(main []uint16)
- func RoundUpPow2(val, alignment int) int
- func SubtractFromConst8(dst, src []byte, val byte)
- func SubtractFromConst8Inplace(main []byte, val byte)
- func SubtractFromConst8Unsafe(dst, src []byte, val byte)
- func SubtractFromConst8UnsafeInplace(main []byte, val byte)
- func UnpackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)
- func UnpackedNibbleLookupInplace(main []byte, tablePtr *NibbleLookupTable)
- func UnpackedNibbleLookupS(dst []byte, src string, tablePtr *NibbleLookupTable)
- func UnpackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
- func UnpackedNibbleLookupUnsafeInplace(main []byte, tablePtr *NibbleLookupTable)
- func XcapUnsafe(bufptr *[]byte)
- func Xor(dst, src1, src2 []byte)
- func XorConst8(dst, src []byte, val byte)
- func XorConst8Inplace(main []byte, val byte)
- func XorConst8Unsafe(dst, src []byte, val byte)
- func XorConst8UnsafeInplace(main []byte, val byte)
- func XorInplace(main, arg []byte)
- func XorUnsafe(dst, src1, src2 []byte)
- func XorUnsafeInplace(main, arg []byte)
- type NibbleLookupTable
Constants ¶
const BitsPerWord = BytesPerWord * 8
BitsPerWord is the number of bits in a machine word.
const BytesPerWord = 8
BytesPerWord is the number of bytes in a machine word. We don't use unsafe.Sizeof(uintptr(1)) since there are advantages to having this as an untyped constant, and there's essentially no drawback since this is an _amd64-specific file.
const Log2BytesPerWord = uint(3)
Log2BytesPerWord is log2(BytesPerWord). This is relevant for manual bit-shifting when we know that's a safe way to divide and the compiler does not (e.g. dividend is of signed int type).
Variables ¶
This section is empty.
Functions ¶
func Accumulate8 ¶
Accumulate8 returns the sum of the (unsigned) bytes in src[].
func Accumulate8Greater ¶
Accumulate8Greater returns the sum of all bytes in src[] greater than the given value.
func AddConst8 ¶
AddConst8 sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).
func AddConst8Inplace ¶
AddConst8Inplace adds the given constant to every byte of main[], with unsigned overflow.
func AddConst8Unsafe ¶
AddConst8Unsafe sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow).
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) are equal.
2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func AddConst8UnsafeInplace ¶
AddConst8UnsafeInplace adds the given constant to every byte of main[], with unsigned overflow.
WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
func And ¶
func And(dst, src1, src2 []byte)
And sets dst[pos] := src1[pos] & src2[pos] for every position in dst. It panics if slice lengths don't match.
func AndConst8 ¶
AndConst8 sets dst[pos] := src[pos] & val for every position in dst. It panics if slice lengths don't match.
func AndConst8Inplace ¶
AndConst8Inplace sets main[pos] := main[pos] & val for every position in main[].
func AndConst8Unsafe ¶
AndConst8Unsafe sets dst[pos] := src[pos] & val for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func AndConst8UnsafeInplace ¶
AndConst8UnsafeInplace sets main[pos] := main[pos] & val for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of main[] are changed.
func AndInplace ¶
func AndInplace(main, arg []byte)
AndInplace sets main[pos] := arg[pos] & main[pos] for every position in main[]. It panics if slice lengths don't match.
func AndUnsafe ¶
func AndUnsafe(dst, src1, src2 []byte)
AndUnsafe sets dst[pos] := src1[pos] & src2[pos] for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].
1. len(src1), len(src2), and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func AndUnsafeInplace ¶
func AndUnsafeInplace(main, arg []byte)
AndUnsafeInplace sets main[pos] := main[pos] & arg[pos] for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].
1. len(arg) and len(main) must be equal.
2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of main[] are changed.
func BitFromEveryByte ¶
BitFromEveryByte fills dst[] with a bitarray containing every 8th bit from src[], starting with bitIdx, where bitIdx is in [0,7]. If len(src) is not divisible by 8, extra bits in the last filled byte of dst are set to zero.
For example, if src[] is
0x1f 0x33 0x0d 0x00 0x51 0xcc 0x34 0x59 0x44
and bitIdx is 2, bit 2 from every byte is
1 0 1 0 0 1 1 0 1
so dst[] is filled with
0x65 0x01.
- It panics if len(dst) < (len(src) + 7) / 8, or bitIdx isn't in [0,7]. - If dst is larger than necessary, the extra bytes are not changed.
func BytesPerVec ¶
func BytesPerVec() int
BytesPerVec is an accessor for the bytesPerVec package variable.
func Count2Bytes ¶
Count2Bytes counts the number of bytes in src[] which are equal to either val1 or val2. (bytes.Count() should be good enough for a single byte.)
func Count3Bytes ¶
Count3Bytes counts the number of bytes in src[] which are equal to val1, val2, or val3.
func CountNibblesInSet ¶
func CountNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int
CountNibblesInSet counts the number of nibbles in src[] which are in the given set. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.
WARNING: This function does not validate the table. It may return a garbage result on invalid input. (However, it won't corrupt memory.)
func CountNibblesInTwoSets ¶
func CountNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)
CountNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.
WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)
func CountUnpackedNibblesInSet ¶
func CountUnpackedNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int
CountUnpackedNibblesInSet counts the number of bytes in src[] which are in the given set, assuming all bytes are <16. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.
WARNING: This function does not validate the table. It may crash or return a garbage result on invalid input. (However, it won't corrupt memory.)
func CountUnpackedNibblesInTwoSets ¶
func CountUnpackedNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)
CountUnpackedNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.
WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)
func DivUpPow2 ¶
DivUpPow2 efficiently divides a number by a power-of-2 divisor. (This works for negative dividends since the language specifies arithmetic right-shifts of signed numbers. I'm pretty sure this doesn't have a performance penalty.)
func FindNaNOrInf64 ¶ added in v0.0.11
FindNaNOrInf64 returns the position of the first NaN/inf value if one is present, and -1 otherwise.
func FirstGreater8 ¶
FirstGreater8 scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).
This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.
func FirstGreater8Unsafe ¶
FirstGreater8Unsafe scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).
This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. startPos is nonnegative.
2. cap(arg) >= RoundUpPow2(len, bytesPerVec).
func FirstLeq8 ¶
FirstLeq8 scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).
This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.
func FirstLeq8Unsafe ¶
FirstLeq8Unsafe scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).
This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.
See warning for FirstGreater8Unsafe.
func FirstUnequal8 ¶
FirstUnequal8 scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). It panics if the lengths are not identical, or startPos is negative.
This is essentially an extension of bytes.Compare().
func FirstUnequal8Unsafe ¶
FirstUnequal8Unsafe scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). This has essentially the same speed as bytes.Compare().
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for arg2[].
1. len(arg1) == len(arg2).
2. Capacities are at least RoundUpPow2(len, bytesPerVec).
func IndexU16 ¶ added in v0.0.10
IndexU16 returns the index of the first instance of val in main, or -1 if val is not present in main.
func Interleave8 ¶
func Interleave8(dst, even, odd []byte)
Interleave8 sets the bytes in dst[] as follows:
if pos is even, dst[pos] := even[pos/2] if pos is odd, dst[pos] := odd[pos/2]
It panics if ((len(dst) + 1) / 2) != len(even), or (len(dst) / 2) != len(odd).
func Interleave8Unsafe ¶
func Interleave8Unsafe(dst, even, odd []byte)
Interleave8Unsafe sets the bytes in dst[] as follows:
if pos is even, dst[pos] := even[pos/2] if pos is odd, dst[pos] := odd[pos/2]
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on dst[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for even[] and odd[].
1. len(even) = (len(dst) + 1) / 2, and len(odd) = len(dst) / 2.
2. cap(dst) >= RoundUpPow2(len(dst) + 1, bytesPerVec), cap(even) >= RoundUpPow2(len(even) + 1, bytesPerVec), and cap(odd) >= RoundUpPow2(len(odd) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func Invmask ¶
func Invmask(dst, src1, src2 []byte)
Invmask sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst. It panics if slice lengths don't match.
func InvmaskConst8 ¶
InvmaskConst8 sets dst[pos] := src[pos] &^ val for every position in dst. It panics if slice lengths don't match.
func InvmaskConst8Inplace ¶
InvmaskConst8Inplace sets main[pos] := main[pos] &^ val for every position in main[].
func InvmaskConst8Unsafe ¶
InvmaskConst8Unsafe sets dst[pos] := src[pos] &^ val for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func InvmaskConst8UnsafeInplace ¶
InvmaskConst8UnsafeInplace sets main[pos] := main[pos] &^ val for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of main[] are changed.
func InvmaskInplace ¶
func InvmaskInplace(main, arg []byte)
InvmaskInplace sets main[pos] := arg[pos] &^ main[pos] for every position in main[]. It panics if slice lengths don't match.
func InvmaskUnsafe ¶
func InvmaskUnsafe(dst, src1, src2 []byte)
InvmaskUnsafe sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].
1. len(src1), len(src2), and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func InvmaskUnsafeInplace ¶
func InvmaskUnsafeInplace(main, arg []byte)
InvmaskUnsafeInplace sets main[pos] := main[pos] &^ arg[pos] for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].
1. len(arg) and len(main) must be equal.
2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of main[] are changed.
func MakeUnsafe ¶
MakeUnsafe returns a byte slice of the given length which is guaranteed to have enough capacity for all Unsafe functions in this package to work. (It is not itself an unsafe function: allocated memory is zero-initialized.) Note that Unsafe functions occasionally have other caveats: e.g. PopcntUnsafe also requires relevant bytes past the end of the slice to be zeroed out.
func MaskThenCountByte ¶
MaskThenCountByte counts the number of bytes in src[] satisfying
src[pos] & mask == val.
func Memset16Raw ¶
Memset16Raw assumes dst points to an array of nElem 2-byte elements, and valPtr points to a single 2-byte element. It fills dst with copies of *valPtr.
func Memset32Raw ¶
Memset32Raw assumes dst points to an array of nElem 4-byte elements, and valPtr points to a single 4-byte element. It fills dst with copies of *valPtr.
func Memset8 ¶
Memset8 sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case.)
func Memset8Unsafe ¶
Memset8Unsafe sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case; see https://github.com/golang/go/issues/5373 .)
WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(dst), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of dst[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on dst[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().
func Or ¶
func Or(dst, src1, src2 []byte)
Or sets dst[pos] := src1[pos] | src2[pos] for every position in dst. It panics if slice lengths don't match.
func OrConst8 ¶
OrConst8 sets dst[pos] := src[pos] | val for every position in dst. It panics if slice lengths don't match.
func OrConst8Inplace ¶
OrConst8Inplace sets main[pos] := main[pos] | val for every position in main[].
func OrConst8Unsafe ¶
OrConst8Unsafe sets dst[pos] := src[pos] | val for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func OrConst8UnsafeInplace ¶
OrConst8UnsafeInplace sets main[pos] := main[pos] | val for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of main[] are changed.
func OrInplace ¶
func OrInplace(main, arg []byte)
OrInplace sets main[pos] := arg[pos] | main[pos] for every position in main[]. It panics if slice lengths don't match.
func OrUnsafe ¶
func OrUnsafe(dst, src1, src2 []byte)
OrUnsafe sets dst[pos] := src1[pos] | src2[pos] for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].
1. len(src1), len(src2), and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func OrUnsafeInplace ¶
func OrUnsafeInplace(main, arg []byte)
OrUnsafeInplace sets main[pos] := main[pos] | arg[pos] for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].
1. len(arg) and len(main) must be equal.
2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of main[] are changed.
func PackedNibbleLookup ¶
func PackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)
PackedNibbleLookup sets the bytes in dst[] as follows:
if pos is even, dst[pos] := table[src[pos / 2] & 15] if pos is odd, dst[pos] := table[src[pos / 2] >> 4]
It panics if len(src) != (len(dst) + 1) / 2.
Nothing bad happens if len(dst) is odd and some high bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.
func PackedNibbleLookupUnsafe ¶
func PackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
PackedNibbleLookupUnsafe sets the bytes in dst[] as follows:
if pos is even, dst[pos] := table[src[pos / 2] & 15] if pos is odd, dst[pos] := table[src[pos / 2] >> 4]
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-#3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) == (len(dst) + 1) / 2.
2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.
3. The caller does not care if a few bytes past the end of dst[] are changed.
func Popcnt ¶
Popcnt returns the number of set bits in the given []byte.
Some effort has been made to make this run acceptably fast on relatively short arrays, since I expect knowing how to do so to be helpful when working with hundreds of millions of .bam reads with ~75 bytes of base data and ~150 bytes of quality data. Interestingly, moving the leading-byte handling code to assembly didn't make a difference.
Some single-threaded benchmark results calling Popcnt 99999999 times on a 14-byte unaligned array:
C implementation: 0.219-0.232s This code: 0.606-0.620s C implementation using memcpy for trailing bytes: 0.964-0.983s
So Go's extra looping and function call overhead can almost triple runtime in the short-array limit, but that's actually not as bad as the 4.5x overhead of trusting memcpy to handle trailing bytes.
func PopcntUnsafe ¶
PopcntUnsafe returns the number of set bits in the given []byte, assuming that any trailing bytes up to the next multiple of BytesPerWord are zeroed out.
func RemakeUnsafe ¶
RemakeUnsafe reuses the given buffer if it has sufficient capacity; otherwise it does the same thing as MakeUnsafe. It does NOT preserve existing contents of buf[]; use ResizeUnsafe() for that.
func ResizeUnsafe ¶
ResizeUnsafe changes the length of buf and ensures it has enough extra capacity to be passed to this package's Unsafe functions. Existing buf[] contents are preserved (with possible truncation), though when length is increased, new bytes might not be zero-initialized.
func Reverse16InplaceRaw ¶
Reverse16InplaceRaw assumes main points to an array of ct 2-byte elements, and reverses it in-place.
func Reverse16Raw ¶
Reverse16Raw assumes dst and src both point to arrays of ct 2-byte elements, and sets dst[pos] := src[ct - 1 - pos] for each position.
func Reverse8 ¶
func Reverse8(dst, src []byte)
Reverse8 sets dst[pos] := src[len(src) - 1 - pos] for every position in src. It panics if len(src) != len(dst).
func Reverse8Inplace ¶
func Reverse8Inplace(main []byte)
Reverse8Inplace reverses the bytes in main[]. (There is no unsafe version of this function.)
func Reverse8Unsafe ¶
func Reverse8Unsafe(dst, src []byte)
Reverse8Unsafe sets dst[pos] := src[len(src) - 1 - pos] for every position in src.
WARNING: This does not verify len(dst) == len(src); call the safe version of this function if you want that.
func ReverseI16 ¶
func ReverseI16(dst, src []int16)
ReverseI16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).
func ReverseI16Inplace ¶
func ReverseI16Inplace(main []int16)
ReverseI16Inplace reverses a []int16 in-place.
func ReverseU16 ¶
func ReverseU16(dst, src []uint16)
ReverseU16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).
func ReverseU16Inplace ¶
func ReverseU16Inplace(main []uint16)
ReverseU16Inplace reverses a []uint16 in-place.
func RoundUpPow2 ¶
RoundUpPow2 returns val rounded up to a multiple of alignment, assuming alignment is a power of 2.
func SubtractFromConst8 ¶
SubtractFromConst8 sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).
func SubtractFromConst8Inplace ¶
SubtractFromConst8Inplace subtracts every byte of main[] from the given constant, with unsigned underflow.
func SubtractFromConst8Unsafe ¶
SubtractFromConst8Unsafe sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow).
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) are equal.
2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func SubtractFromConst8UnsafeInplace ¶
SubtractFromConst8UnsafeInplace subtracts every byte of main[] from the given constant, with unsigned underflow.
WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
func UnpackedNibbleLookup ¶
func UnpackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)
UnpackedNibbleLookup sets the bytes in dst[] as follows:
if src[pos] < 128, set dst[pos] := table[src[pos] & 15] otherwise, set dst[pos] := 0
It panics if len(src) != len(dst).
func UnpackedNibbleLookupInplace ¶
func UnpackedNibbleLookupInplace(main []byte, tablePtr *NibbleLookupTable)
UnpackedNibbleLookupInplace replaces the bytes in main[] as follows:
if value < 128, set to table[value & 15] otherwise, set to 0
func UnpackedNibbleLookupS ¶ added in v0.0.10
func UnpackedNibbleLookupS(dst []byte, src string, tablePtr *NibbleLookupTable)
UnpackedNibbleLookupS is a variant of UnpackedNibbleLookup() that takes string src.
func UnpackedNibbleLookupUnsafe ¶
func UnpackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
UnpackedNibbleLookupUnsafe sets the bytes in dst[] as follows:
if src[pos] < 128, set dst[pos] := table[src[pos] & 15] otherwise, set dst[pos] := 0
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) are equal.
2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func UnpackedNibbleLookupUnsafeInplace ¶
func UnpackedNibbleLookupUnsafeInplace(main []byte, tablePtr *NibbleLookupTable)
UnpackedNibbleLookupUnsafeInplace replaces the bytes in main[] as follows:
if value < 128, set to table[value & 15] otherwise, set to 0
WARNING: This is a function designed to be used in inner loops, which makes assumptions about capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().
1. cap(main) must be at least RoundUpPow2(len(main) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of main[] are changed.
func XcapUnsafe ¶
func XcapUnsafe(bufptr *[]byte)
XcapUnsafe is shorthand for ResizeUnsafe's most common use case (no length change, just want to ensure sufficient capacity).
func Xor ¶
func Xor(dst, src1, src2 []byte)
Xor sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst. It panics if slice lengths don't match.
func XorConst8 ¶
XorConst8 sets dst[pos] := src[pos] ^ val for every position in dst. It panics if slice lengths don't match.
func XorConst8Inplace ¶
XorConst8Inplace sets main[pos] := main[pos] ^ val for every position in main[].
func XorConst8Unsafe ¶
XorConst8Unsafe sets dst[pos] := src[pos] ^ val for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func XorConst8UnsafeInplace ¶
XorConst8UnsafeInplace sets main[pos] := main[pos] ^ val for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of main[] are changed.
func XorInplace ¶
func XorInplace(main, arg []byte)
XorInplace sets main[pos] := arg[pos] ^ main[pos] for every position in main[]. It panics if slice lengths don't match.
func XorUnsafe ¶
func XorUnsafe(dst, src1, src2 []byte)
XorUnsafe sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].
1. len(src1), len(src2), and len(dst) must be equal.
2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of dst[] are changed.
func XorUnsafeInplace ¶
func XorUnsafeInplace(main, arg []byte)
XorUnsafeInplace sets main[pos] := main[pos] ^ arg[pos] for every position in main[].
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].
1. len(arg) and len(main) must be equal.
2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of main[] are changed.
Types ¶
type NibbleLookupTable ¶
type NibbleLookupTable struct {
// contains filtered or unexported fields
}
NibbleLookupTable represents a parallel-byte-substitution operation f, where every byte b in a byte-slice is replaced with
f(b) := shuffle[0][b & 15] for b <= 127, and f(b) := 0 for b > 127.
(The second part is usually irrelevant in practice, but must be defined this way to allow _mm_shuffle_epi8()/_mm256_shuffle_epi8()/_mm512_shuffle_epi8() to be used to implement the operation efficiently.) It's named NibbleLookupTable rather than ByteLookupTable since only the bottom nibble of each byte can be used for table lookup. It potentially stores multiple adjacent copies of the lookup table since that speeds up the AVX2 and AVX-512 use cases (the table can be loaded with a single _mm256_loadu_si256 operation, instead of e.g. _mm_loadu_si128 followed by _mm256_set_m128i with the same argument twice), and the typical use case involves initializing very few tables and using them many, many times.
func MakeNibbleLookupTable ¶
func MakeNibbleLookupTable(table [16]byte) (t NibbleLookupTable)
MakeNibbleLookupTable generates a NibbleLookupTable from a [16]byte.
func (*NibbleLookupTable) Get ¶
func (t *NibbleLookupTable) Get(b byte) byte
Get performs the b <= 127 part of the lookup operation described above. The b > 127 branch is omitted because in many use cases (e.g. PackedNibbleLookup below), it can be proven that b > 127 is impossible, and removing the if-statement is a significant performance win when it's possible.