Documentation ¶
Overview ¶
Package biosimd provides access to SIMD-based implementations of several common .bam/.fa/etc.-specific operations on byte arrays which the compiler cannot be trusted to autovectorize within the next several years.
See base/simd/doc.go for more comments on the overall design.
Index ¶
- Constants
- Variables
- func ASCIITo2bit(dst, src []byte)
- func ASCIIToSeq8(dst, src []byte)
- func ASCIIToSeq8Inplace(main []byte)
- func CleanASCIISeqInplace(ascii8 []byte)
- func CleanASCIISeqNoCapitalizeInplace(ascii8 []byte)
- func FillFastqRecordBodyFromNibbles(dst, src []byte, nBase int, baseTablePtr, qualTablePtr *NibbleLookupTable)
- func IsNonACGTNPresent(ascii8 []byte) bool
- func IsNonACGTPresent(ascii8 []byte) bool
- func PackSeq(dst, src []byte)
- func PackSeqUnsafe(dst, src []byte)
- func PackedSeqCount(seq4 []byte, tablePtr *NibbleLookupTable, startPos, endPos int) int
- func PackedSeqCountTwo(seq4 []byte, table1Ptr, table2Ptr *NibbleLookupTable, startPos, endPos int) (int, int)
- func ReverseComp2(dst, src []byte)
- func ReverseComp2Inplace(acgt8 []byte)
- func ReverseComp2Unsafe(dst, src []byte)
- func ReverseComp2UnsafeInplace(acgt8 []byte)
- func ReverseComp4(dst, src []byte)
- func ReverseComp4Inplace(seq8 []byte)
- func ReverseComp4Unsafe(dst, src []byte)
- func ReverseComp4UnsafeInplace(seq8 []byte)
- func ReverseComp8Inplace(ascii8 []byte)
- func ReverseComp8InplaceNoValidate(ascii8 []byte)
- func ReverseComp8NoValidate(dst, src []byte)
- func UnpackAndReplaceSeq(dst, src []byte, tablePtr *NibbleLookupTable)
- func UnpackAndReplaceSeqSubset(dst, src []byte, tablePtr *NibbleLookupTable, startPos, endPos int)
- func UnpackAndReplaceSeqUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
- func UnpackSeq(dst, src []byte)
- func UnpackSeqUnsafe(dst, src []byte)
- type NibbleLookupTable
Constants ¶
const BytesPerWord = simd.BytesPerWord
BytesPerWord is the number of bytes in a machine word.
const Log2BytesPerWord = simd.Log2BytesPerWord
Log2BytesPerWord is log2(BytesPerWord). This is relevant for manual bit-shifting when we know that's a safe way to divide and the compiler does not (e.g. dividend is of signed int type).
Variables ¶
var ( // SeqASCIITable maps 4-bit seq[] values to their ASCII representations. // It's a common argument for UnpackAndReplaceSeq(). SeqASCIITable = MakeNibbleLookupTable([16]byte{'=', 'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', 'N'}) )
Functions ¶
func ASCIITo2bit ¶
func ASCIITo2bit(dst, src []byte)
ASCIITo2bit sets the bytes in dst[] as follows:
if pos is congruent to 0 mod 4, little-endian bits 0-1 of dst[pos / 4] := 0 if src[pos] == 'A'/'a' 1 if src[pos] == 'C'/'c' 2 if src[pos] == 'G'/'g' 3 if src[pos] == 'T'/'t' similarly, if pos is congruent to 1 mod 4, src[pos] controls bits 2-3 of dst[pos / 4], etc. trailing high bits of the last byte are set to zero.
It panics if len(dst) != (len(src) + 3) / 4.
WARNING: This does not verify that all input characters are in {'A', 'C', 'G', 'T', 'a', 'c', 'g', 't'}. Results are arbitrary if any input characters are invalid, though the function is still memory-safe in that event.
func ASCIIToSeq8 ¶
func ASCIIToSeq8(dst, src []byte)
ASCIIToSeq8 sets dst[pos] as follows:
src[pos] == 'A'/'a': dst[pos] == 1 src[pos] == 'C'/'c': dst[pos] == 2 src[pos] == 'G'/'g': dst[pos] == 4 src[pos] == 'T'/'t': dst[pos] == 8 src[pos] == anything else: dst[pos] == 15
It panics if len(dst) != len(src).
func ASCIIToSeq8Inplace ¶
func ASCIIToSeq8Inplace(main []byte)
ASCIIToSeq8Inplace converts the characters of main[pos] as follows:
'A'/'a' -> 1 'C'/'c' -> 2 'G'/'g' -> 4 'T'/'t' -> 8 anything else -> 15
func CleanASCIISeqInplace ¶
func CleanASCIISeqInplace(ascii8 []byte)
CleanASCIISeqInplace capitalizes 'a'/'c'/'g'/'t', and replaces everything non-ACGT with 'N'.
func CleanASCIISeqNoCapitalizeInplace ¶
func CleanASCIISeqNoCapitalizeInplace(ascii8 []byte)
CleanASCIISeqNoCapitalizeInplace replaces everything non-ACGTacgt with 'N'.
func FillFastqRecordBodyFromNibbles ¶
func FillFastqRecordBodyFromNibbles(dst, src []byte, nBase int, baseTablePtr, qualTablePtr *NibbleLookupTable)
FillFastqRecordBodyFromNibbles fills the body (defined as the last three lines) of a 4-line FASTQ record, given a packed 4-bit representation of the base+qual information and the decoding tables. (Windows line-breaks are not supported.)
- len(dst) must be at least 2 * nBase + 4, but it's allowed to be larger.
- len(src) must be at least (nBase + 1) >> 1, but it's allowed to be larger.
- This is designed for read-length >= 32. It still produces the correct result for smaller lengths, but there is a fairly simple faster algorithm (using a pair of 256-element uint16 lookup tables and encoding/binary's binary.LittleEndian.PutUint16() function) for that case, which is being omitted for now due to irrelevance for our current use cases.
func IsNonACGTNPresent ¶
IsNonACGTNPresent returns true iff there is a non-capital-ACGTN character in the slice.
func IsNonACGTPresent ¶
IsNonACGTPresent returns true iff there is a non-capital-ACGT character in the slice.
func PackSeq ¶
func PackSeq(dst, src []byte)
PackSeq sets the bytes in dst[] as follows:
if pos is even, high 4 bits of dst[pos / 2] := src[pos] if pos is odd, low 4 bits of dst[pos / 2] := src[pos] if len(src) is odd, the low 4 bits of dst[len(src) / 2] are zero
It panics if len(dst) != (len(src) + 1) / 2.
This is the inverse of UnpackSeq().
WARNING: Actual values in dst[] bytes may be garbage if any src[] bytes are greater than 15; this function only guarantees that no buffer overflow will occur.
func PackSeqUnsafe ¶
func PackSeqUnsafe(dst, src []byte)
PackSeqUnsafe sets the bytes in dst[] as follows:
if pos is even, high 4 bits of dst[pos / 2] := src[pos] if pos is odd, low 4 bits of dst[pos / 2] := src[pos] if len(src) is odd, the low 4 bits of dst[len(src) / 2] are zero
This is the inverse of UnpackSeqUnsafe().
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #3-4 are always satisfied when the last potentially-size-increasing operation on src[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(dst) = (len(src) + 1) / 2.
2. All elements of src[] are less than 16.
3. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.
4. The caller does not care if a few bytes past the end of dst[] are changed.
func PackedSeqCount ¶
func PackedSeqCount(seq4 []byte, tablePtr *NibbleLookupTable, startPos, endPos int) int
PackedSeqCount counts the number of .bam base codes in positions startPos..(endPos - 1) of seq4 in the given set, where seq4 is in .bam packed 4-bit big-endian format.
The set must be represented as table[x] == 1 when code x is in the set, and table[x] == 0 when code x isn't.
WARNING: This function does not validate the table, startPos, or endPos. It may crash or return a garbage result on invalid input. (However, it won't corrupt memory.)
func PackedSeqCountTwo ¶
func PackedSeqCountTwo(seq4 []byte, table1Ptr, table2Ptr *NibbleLookupTable, startPos, endPos int) (int, int)
PackedSeqCountTwo counts the number of .bam base codes in positions startPos..(endPos - 1) of seq4 in the given two sets, where seq4 is in .bam packed 4-bit big-endian format.
The sets must be represented as table[x] == 1 when code x is in the set, and table[x] == 0 when code x isn't.
WARNING: This function does not validate the tables, startPos, or endPos. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)
func ReverseComp2 ¶
func ReverseComp2(dst, src []byte)
ReverseComp2 saves the reverse-complement of src[] to dst[], assuming that they're encoded with one byte per base, ACGT=0123. It panics if len(dst) != len(src).
func ReverseComp2Inplace ¶
func ReverseComp2Inplace(acgt8 []byte)
ReverseComp2Inplace reverse-complements acgt8[], assuming that it's encoded with one byte per base, ACGT=0123.
func ReverseComp2Unsafe ¶
func ReverseComp2Unsafe(dst, src []byte)
ReverseComp2Unsafe saves the reverse-complement of src[] to dst[], assuming that they're encoded with one byte per base, ACGT=0123.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true of dst[].
1. len(src) == len(dst).
2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true of dst.
3. The caller does not care if a few bytes past the end of dst[] are changed.
func ReverseComp2UnsafeInplace ¶
func ReverseComp2UnsafeInplace(acgt8 []byte)
ReverseComp2UnsafeInplace reverse-complements acgt8[], assuming that it's encoded with one byte per base, ACGT=0123.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on acgt8[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. Capacity of acgt8[] is at least RoundUpPow2(len(acgt8) + 1, bytesPerVec).
2. The caller does not care if a few bytes past the end of acgt8[] are changed.
func ReverseComp4 ¶
func ReverseComp4(dst, src []byte)
ReverseComp4 saves the reverse-complement of src[] to dst[], assuming .bam seq-field encoding with one 4-bit byte per base. It panics if len(dst) != len(src).
WARNING: If a src[] value is larger than 15, it's possible for this to immediately crash, and it's also possible for this to return and fill src[] with garbage. Only promise is that we don't scribble over arbitrary memory.
func ReverseComp4Inplace ¶
func ReverseComp4Inplace(seq8 []byte)
ReverseComp4Inplace reverse-complements seq8[], assuming that it's using .bam seq-field encoding with one 4-bit byte per base.
WARNING: If a seq8[] value is larger than 15, it's possible for this to immediately crash, and it's also possible for this to return and fill seq8[] with garbage. Only promise is that we don't scribble over arbitrary memory.
func ReverseComp4Unsafe ¶
func ReverseComp4Unsafe(dst, src []byte)
ReverseComp4Unsafe saves the reverse-complement of src[] to dst[], assuming .bam seq-field encoding with one 4-bit byte per base.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #3-4 are always satisfied when the last potentially-size-increasing operation on src[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true of dst[].
1. len(src) == len(dst).
2. All elements of src[] are less than 16.
3. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true of dst.
4. The caller does not care if a few bytes past the end of dst[] are changed.
func ReverseComp4UnsafeInplace ¶
func ReverseComp4UnsafeInplace(seq8 []byte)
ReverseComp4UnsafeInplace reverse-complements seq8[], assuming that it's using .bam seq-field encoding with one 4-bit byte per base.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on seq8[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().
1. All elements of seq8[] are less than 16.
2. Capacity of seq8 is at least RoundUpPow2(len(seq8) + 1, bytesPerVec).
3. The caller does not care if a few bytes past the end of seq8[] are changed.
func ReverseComp8Inplace ¶
func ReverseComp8Inplace(ascii8 []byte)
ReverseComp8Inplace reverse-complements ascii8[], assuming that it's using ASCII encoding. More precisely, it maps 'A'/'a' to 'T', 'C'/'c' to 'G', 'G'/'g' to 'C', 'T'/'t' to 'A', and everything else to 'N'.
func ReverseComp8InplaceNoValidate ¶
func ReverseComp8InplaceNoValidate(ascii8 []byte)
ReverseComp8InplaceNoValidate reverse-complements ascii8[], assuming that it's using ASCII encoding, and all values are in {0, '0', 'A', 'C', 'G', 'T', 'N', 'a', 'c', 'g', 't', 'n'}.
If the input assumption is satisfied, output is restricted to 'A'/'C'/'G'/'T'/'N'. Other bytes may be written if the input assumption is not satisfied.
This usually takes ~35% less time than the validating function.
func ReverseComp8NoValidate ¶
func ReverseComp8NoValidate(dst, src []byte)
ReverseComp8NoValidate writes the reverse-complement of src[] to dst[], assuming src is using ASCII encoding, and all values are in {0, '0', 'A', 'C', 'G', 'T', 'N', 'a', 'c', 'g', 't', 'n'}.
If the input assumption is satisfied, output is restricted to 'A'/'C'/'G'/'T'/'N'. Other bytes may be written if the input assumption is not satisfied.
It panics if len(dst) != len(src).
func UnpackAndReplaceSeq ¶
func UnpackAndReplaceSeq(dst, src []byte, tablePtr *NibbleLookupTable)
UnpackAndReplaceSeq sets the bytes in dst[] as follows:
if pos is even, dst[pos] := table[src[pos / 2] >> 4] if pos is odd, dst[pos] := table[src[pos / 2] & 15]
It panics if len(src) != (len(dst) + 1) / 2.
Nothing bad happens if len(dst) is odd and some low bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.
func UnpackAndReplaceSeqSubset ¶
func UnpackAndReplaceSeqSubset(dst, src []byte, tablePtr *NibbleLookupTable, startPos, endPos int)
UnpackAndReplaceSeqSubset sets the bytes in dst[] as follows:
if srcPos is even, dst[srcPos-startPos] := table[src[srcPos / 2] >> 4] if srcPos is odd, dst[srcPos-startPos] := table[src[srcPos / 2] & 15]
It panics if len(dst) != endPos - startPos, startPos < 0, or len(src) * 2 < endPos.
func UnpackAndReplaceSeqUnsafe ¶
func UnpackAndReplaceSeqUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)
UnpackAndReplaceSeqUnsafe sets the bytes in dst[] as follows:
if pos is even, dst[pos] := table[src[pos / 2] >> 4] if pos is odd, dst[pos] := table[src[pos / 2] & 15]
It panics if len(src) != (len(dst) + 1) / 2.
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-#3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) == (len(dst) + 1) / 2.
2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.
3. The caller does not care if a few bytes past the end of dst[] are changed.
func UnpackSeq ¶
func UnpackSeq(dst, src []byte)
UnpackSeq sets the bytes in dst[] as follows:
if pos is even, dst[pos] := src[pos / 2] >> 4 if pos is odd, dst[pos] := src[pos / 2] & 15
It panics if len(src) != (len(dst) + 1) / 2.
Nothing bad happens if len(dst) is odd and some low bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.
func UnpackSeqUnsafe ¶
func UnpackSeqUnsafe(dst, src []byte)
UnpackSeqUnsafe sets the bytes in dst[] as follows:
if pos is even, dst[pos] := src[pos / 2] >> 4 if pos is odd, dst[pos] := src[pos / 2] & 15
WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is simd.{Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].
1. len(src) = (len(dst) + 1) / 2.
2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.
3. The caller does not care if a few bytes past the end of dst[] are changed.
Types ¶
type NibbleLookupTable ¶
type NibbleLookupTable = simd.NibbleLookupTable
NibbleLookupTable is re-exported here to reduce base/simd import clutter.
func MakeNibbleLookupTable ¶
func MakeNibbleLookupTable(table [16]byte) (t NibbleLookupTable)
MakeNibbleLookupTable is re-exported here to reduce base/simd import clutter.