Documentation ¶
Index ¶
- Constants
- func Bmh(x, p string, callback func(int))
- func BmhWithAlphabet(x, p string, callback func(int))
- func BmhWithMap(x, p string, callback func(int))
- func BorderSearch(x, p string, callback func(int))
- func Borderarray(x string) []int
- func Bwt(x []byte, sa []int32) []byte
- func CountEdits(x, p string, pos int, cigar string) (int, error)
- func ExtractAlignment(x, p string, pos int, cigar string) (subx, subp string, err error)
- func FMIndexApproxFromTables(tbls *FMIndexTables) func(p string, edits int, cb func(i int, cigar string))
- func FMIndexApproxPreprocess(x string) func(p string, edits int, cb func(i int, cigar string))
- func FMIndexExactFromTables(tbls *FMIndexTables) func(p string, cb func(i int))
- func FMIndexExactPreprocess(x string) func(p string, cb func(i int))
- func Kmp(x, p string, callback func(int))
- func Naive(x, p string, callback func(int))
- func OpsToCigar(ops EditOps) string
- func ReverseBwt(bwt []byte) []byte
- func ReverseString(s string) string
- func Sais(x string) (sa []int32)
- func SaisWithAlphabet(x string, alpha *Alphabet) ([]int32, error)
- func Skew(x string) []int32
- func SkewWithAlphabet(x string, alpha *Alphabet) ([]int32, error)
- func StSaConstruction(x string) []int32
- func StrictBorderarray(x string) []int
- type Alphabet
- func (alpha *Alphabet) Contains(a byte) bool
- func (alpha *Alphabet) GobDecode(b []byte) (err error)
- func (alpha Alphabet) GobEncode() (res []byte, err error)
- func (alpha *Alphabet) MapToBytes(x string) ([]byte, error)
- func (alpha *Alphabet) MapToBytesWithSentinel(x string) ([]byte, error)
- func (alpha *Alphabet) MapToInts(x string) ([]int32, error)
- func (alpha *Alphabet) MapToIntsWithSentinel(x string) ([]int32, error)
- func (alpha *Alphabet) RevmapBytes(x []byte) string
- func (alpha *Alphabet) RevmapBytesStripSentinel(x []byte) string
- func (alpha *Alphabet) Size() int
- type AlphabetLookupError
- type ApproxEdit
- type CTab
- type EdgeLabel
- type EditOps
- type FMIndexTables
- type InnerNode
- type InvalidCigar
- type LeafNode
- type OTab
- type STNode
- type STNodeType
- type SharedNode
- type SuffixTree
- type Trie
Constants ¶
const ( // Sentinel is a unique byte (zero) used in several algorithms // to be different and smaller than all other letters. Sentinel = '\x00' // SentinelSymbol is a rune you can use to display the sentinel // when you translate byte slices into strings. SentinelSymbol = rune('𝕊') )
const ( M ApproxEdit = iota // Match/mismatch operations I = iota // Insertion operations D = iota // Deletion operations )
Approximative matching edit operations.
Variables ¶
This section is empty.
Functions ¶
func Bmh ¶
Bmh runs the O(nm) worst-case but expected sub-linear time Boyer-Moore-Horspool algorithm. This version uses a table of size 256 to map bytes to jumps, exploiting that we get bytes out of indexing strings.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func BmhWithAlphabet ¶
BmhWithAlphabet runs the O(nm) worst-case but expected sub-linear time Boyer-Moore-Horspool algorithm. This version maps the input strings before search, so we know their alphabet size, and can create a jump table of the apprporiate size.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func BmhWithMap ¶
BmhWithMap runs the O(nm) worst-case but expected sub-linear time Boyer-Moore-Horspool algorithm. It uses a map for the jump table, and can theoretically handle more characters than bytes (but doesn't, since indexing into strings gives us bytes). It demonstrates the performance hit you get from using a map rather than an array as in Bmh.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func BorderSearch ¶
BorderSearch runs the O(n+m) time algorithm based on building a border array and reporting when its value matches m.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func Borderarray ¶
Borderarray computes the border array over the string x. The border array ba will at index i have the length of the longest proper border of the string x[:i+1], i.e. the longest non-empty string that is both a prefix and a suffix of x[:i+1].
func Bwt ¶
Bwt gives you the Burrows-Wheeler transform of a string, computed using the suffix array for the string. The string should have a sentinel
func CountEdits ¶
CountEdits counts the number of edits in the local alignment between x and p specified by pos and cigar
func ExtractAlignment ¶
ExtractAlignment extracts a pairwise alignment from the reference, x, the read, p, the position and the edits cigar.
func FMIndexApproxFromTables ¶
func FMIndexApproxFromTables(tbls *FMIndexTables) func(p string, edits int, cb func(i int, cigar string))
FMIndexApproxFromTables return a search function from the preprocessed tables.
func FMIndexApproxPreprocess ¶
FMIndexApproxPreprocess preprocesses the string x and returns a function that you can use to efficiently search in x.
func FMIndexExactFromTables ¶
func FMIndexExactFromTables(tbls *FMIndexTables) func(p string, cb func(i int))
FMIndexExactFromTables returns a search function based on the preprocessed tables
func FMIndexExactPreprocess ¶
FMIndexExactPreprocess preprocesses the string x and returns a function that you can use to efficiently search in x.
func Kmp ¶
Kmp runs the O(n+m) time Knuth-Morris-Prat algorithm.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func Naive ¶
Naive runs the naive (duh) O(nm) times search algorithm.
Parameters:
- x: the string we search in.
- p: the string we search for
- callback: a function called for each occurrence
func ReverseBwt ¶
ReverseBwt reconstructs the original string from the bwt string.
func ReverseString ¶
ReverseString returns a new string that is the reverse of the input string.
func SaisWithAlphabet ¶
SaisWithAlphabet builds a suffix array from the string x, first mapping it to bytes using the alphabet alpha.
func SkewWithAlphabet ¶
SkewWithAlphabet builds the suffix array of a string, first mapping it to a byte slice using the alphabet alpha.
func StSaConstruction ¶
StSaConstruction constructs a suffix array from a suffix tree.
func StrictBorderarray ¶
StrictBorderarray computes the strict border array over the string x. This is almost the same as the border array, but ba[i] will be the longest proper border of the string x[:i+1] such that x[ba[i]] != x[i].
Types ¶
type Alphabet ¶
type Alphabet struct {
// contains filtered or unexported fields
}
Alphabet handles mapping from strings to smaller alphabets of bytes.
func MapString ¶
MapString creates an alphabet from the input and then maps the string through it, returning both resulting byte slice and alphabet.
func MapStringWithSentinel ¶
MapStringWithSentinel creates an alphabet from the input and then maps the string through it, returning both resulting byte slice and alphabet. Unlike MapString, MapStringWithSentinel will add a terminal zero byte to the byte slice it returns.
func NewAlphabet ¶
NewAlphabet creates an alphabet consisting of the bytes in ref only.
func (*Alphabet) GobDecode ¶
GobDecode implements the encoder interface for serialising an alphabet to a stream of bytes
func (Alphabet) GobEncode ¶
GobEncode implements the encoder interface for serialising an alphabet to a stream of bytes
func (*Alphabet) MapToBytes ¶
MapToBytes translates a string into a byte slice, mapping characters according to the alphabet
func (*Alphabet) MapToBytesWithSentinel ¶
MapToBytesWithSentinel translates a string into a byte slice, mapping characters according to the alphabet. The resulting byte slice has a terminal zero, acting as a sentinel.
func (*Alphabet) MapToInts ¶
MapToInts translates a string into an integer slice, mapping characters according to the alphabet. It only differs from MapToBytes in the type of the output, but is used in algorithms where we need to operate on integers rather than the smaller bytes.
func (*Alphabet) MapToIntsWithSentinel ¶
MapToIntsWithSentinel translates a string into an integer slice, mapping characters according to the alphabet. The resulting int slice has a terminal zero, acting as a sentinel. It only differs from MapToBytes in the type of the output, but is used in algorithms where we need to operate on integers rather than the smaller bytes.
func (*Alphabet) RevmapBytes ¶
RevmapBytes maps a byte slice back into a string according to the alphabet that was used to translate the string into bytes in the first place.
func (*Alphabet) RevmapBytesStripSentinel ¶
RevmapBytesStripSentinel maps a byte slice back into a string according to the alphabet that was used to translate the string into bytes in the first place. RevmapBytesStripSentinel will strip the last character in the input from the output, getting rid of a sentinel that (hopefully) was added when the byte slice was created.
type AlphabetLookupError ¶
type AlphabetLookupError struct {
// contains filtered or unexported fields
}
AlphabetLookupError are errors that occur if you look up a character that is not in the alphabet.
func (*AlphabetLookupError) Error ¶
func (err *AlphabetLookupError) Error() string
Error implements the interface for errors.
type CTab ¶
type CTab struct {
CumSum []int
}
CTab Structur holding the C-table for BWT search. This is a map from letters in the alphabet to the cumulative sum of how often we see letters in the BWT
type EdgeLabel ¶
type EdgeLabel []byte
EdgeLabel is the representation of a string along an edge. It is a byte slice, so it takes up constant space, holding only a pointer, length and capacity, while the underlying string is shared between the edges in the tree.
type EditOps ¶
type EditOps = []ApproxEdit
EditOps is a type for a sequence of edit operations
func CigarToOps ¶
CigarToOps turns a cigar string into the list of edit ops.
type FMIndexTables ¶
type FMIndexTables struct { Alpha *Alphabet Sa []int32 Ctab *CTab Otab *OTab // for approx matching Rotab *OTab }
FMIndexTables contains the preprocessed tables used for FM-index searching
func BuildFMIndexApproxTables ¶
func BuildFMIndexApproxTables(x string) *FMIndexTables
BuildFMIndexExactTables builds the preprocessing tables for exact FM-index searching.
func BuildFMIndexExactTables ¶
func BuildFMIndexExactTables(x string) *FMIndexTables
BuildFMIndexExactTables builds the preprocessing tables for exact FM-index searching.
type InvalidCigar ¶
type InvalidCigar struct {
// contains filtered or unexported fields
}
InvalidCigar are errors when you use a cigar that isn't in the right format
func NewInvalidCigar ¶
func NewInvalidCigar(x string) *InvalidCigar
NewInvalidCigar creates an InvalidCigar error
func (*InvalidCigar) Error ¶
func (err *InvalidCigar) Error() string
Error implements the interface for errors.
func (*InvalidCigar) Is ¶
func (err *InvalidCigar) Is(other error) bool
Is implements the Is interface for errors.
type LeafNode ¶
type LeafNode struct { Index int }
LeafNode contains the additional properties that only leaves have.
type OTab ¶
type OTab struct {
// contains filtered or unexported fields
}
OTab Holds the o-table (rank table) from a BWT string
func NewOTab ¶
NewOTab builds the o-table from a string. It uses the suffix array to get the BWT and a c-table to handle the alphabet.
func (*OTab) GobDecode ¶
GobDecode implements the decoder interface for serialising to a stream of bytes
type STNode ¶
type STNode struct { NodeType STNodeType // contains filtered or unexported fields }
STNode wraps either a leaf or an inner node. Use the node type determine which, before you access it as a node.
func (STNode) Inner ¶
Inner casts a node to a leaf. You should only do this if the NodeType is Inner.
func (STNode) IsNil ¶
IsNil returns true if the node represents a nil pointer. If it does, you cannot cast it to any other node type.
func (STNode) LeafIndices ¶
LeafIndices maps fn over all the leaf indices in the subtree rooted at n.
func (STNode) PathLabel ¶
PathLabel returns the string from the root down to a node. Do not call it with a NIL node, that is an error the function will crash.
func (STNode) Shared ¶
func (n STNode) Shared() *SharedNode
Shared returns a pointer to the shared part of leaves and inner nodes. It is an error to access a node if it IsNil(), but otherwise, you can get the shared part of both leaves and inner nodes.
type STNodeType ¶
type STNodeType int
STNodeType is a tag for identifying when we have leaves and when we have inner nodes.
const ( // Leaf means that the node is a leaf Leaf STNodeType = iota // Inner means that the node is an inner node Inner STNodeType = iota )
type SharedNode ¶
type SharedNode struct {}
SharedNode contains the attributes that both leaves and inner nodes have.
type SuffixTree ¶
SuffixTree is the representation of a suffix tree.
func McCreight ¶
func McCreight(x string) *SuffixTree
McCreight constructs a suffix tree using McCreight's algorithm.
func NaiveST ¶
func NaiveST(x string) *SuffixTree
NaiveST is the naive O(n²) construction algorithm.
func (*SuffixTree) ComputeSuffixAndLcpArray ¶
func (st *SuffixTree) ComputeSuffixAndLcpArray() (sa, lcp []int32)
ComputeSuffixAndLcpArray constructs a suffix array and longest common prefix array from a suffix tree.
func (*SuffixTree) Search ¶
func (st *SuffixTree) Search(p string, visitor func(int))
Search maps visitor through all the leaves in the subtree found by a search.
func (*SuffixTree) ToDot ¶
func (st *SuffixTree) ToDot(w io.Writer)
ToDot writes a dot representation of the tree to the output writer w.
type Trie ¶
type Trie struct { // The parent of the node, unless this node is the root Parent *Trie // The suffix link of this node Suffix *Trie // Outlist for Aho-Corasick Outlist *Trie // The string label of this node. We assume that // only one string can be mapped to the same node. Label int // The children of this node Children [256]*Trie // If you map the alphabet, you could save some space... }
Trie is both a trie and a node in a trie.
func (*Trie) FindNode ¶
FindNode finds the node at the end of the string, or returns nil if there isn't one.
func (*Trie) SetSuffixAndOutput ¶
func (t *Trie) SetSuffixAndOutput()
SetSuffixAndOutput sets the suffix link and output for Aho-Corasick