Documentation ¶
Index ¶
- Constants
- Variables
- func BlockAlign(a, b []byte)
- func Hirschberg(x, y []byte) (a, b []byte, sc int)
- func HirschbergRecur(x, y []byte, basec, baser int) ([]int, int)
- func SeqPairNormalize(seq_a, seq_b []byte) error
- func SimpRev(x, y []byte)
- func Simp_b(x, y []byte, basec, baser int) ([]int, int)
- func Simp_b_old(x, y []byte, basec, baser int) ([]int, int)
- type Diff
- type Memz
Constants ¶
const ( DIFF = iota GAPA = iota GAPB = iota SEQ = iota )
Variables ¶
var GAP int
var Score [][]int
Functions ¶
func BlockAlign ¶
func BlockAlign(a, b []byte)
This is a heuristic alignment. This does not do global alignemtn but does a hybrid.
For two given sequences, split it into 'blocks', where the blocks are of variable size and are marked by checkpoints in a rolling hash.
The rolling hash checkpoints are marked when the low order bits are all zero of a Rabin fingerprint. The number of low order bits for the fingerprint is taken to be:
1 + floor( sqrt( min(length(a), length(b)) ) )
When a sub sequence is marked by it's endpoints in the rolling hash, the CRC32 checksum is calculated and stored.
The two sequences checksums are then compared and an attempt to align matching checksums is made on the assumption that both sequences are mostly the same with some minor alterations in the middle.
Runs of sub sequence that don't have aligned checksums are recursively passed into Memz which a lower threshold set to mark when true global alignment should be performed.
If the mismatch run is found to be the length of the sequence, this function fails.
If the checksum alignment step fails, this function fails.
**THIS IS NOT GLOBAL ALIGNEMT**. This is a heuristic. This is primarily meant to be run on long strings that are relatively similar to each other. Though this isn't global alignemtn, some applications may find the alignment produced (if this heuristic succeeds) good enoug.
func Hirschberg ¶
Mostly a wrapper for HirschbergRecur. HirschbergRecur returns an array and score, with even elements of the array being the 'x' positions in the implied dynamic programming matrix and the odd positions being the implied 'y' positions of the implied dynamic programming matrix.
e.g. x = abc y = c ipath = [ 0, -1, 1, -1, 2, 0 ]
would imply an alignment of abc --c
Hirschberg recur constructs the two strings (with '-' as the gap character) and the score.
func HirschbergRecur ¶
Hirschberg's algorithm for dynamic programming. Hirschberg's algorithm finds the optimal score along with the path in linear space and quadratic time.
The implicit DP matrix can be thought of as having the x bytes along the columns and y bytes along the rows with a gap ('-') prefix character attached to each. e.g. for x = 'ox', y = 'fox':
x0 x1 - o x - . . .
y0 f . . . y1 o . . . y2 x . . .
The Hirschberg algorithm works by finding a midpoint where the path must pass through then recursively applying the Hirschberg on the upper left and lower right sub problem. The midpoint is calculated by keeping the column of scores for the upper left and lower right sub problem then choosing the appropriate value.
Each midpoint can be stored and constitues only an additional linear amount of space to store the path.
When the sub problem becomes small enough (for example, when |x| or |y| <= 2) vanilla dynamic programming can be applied.
HirschbergRecur returns an array and score, with even elements of the array being the 'x' positions in the implied dynamic programming matrix and the odd positions being the implied 'y' positions of the implied dynamic programming matrix.
e.g. x = abc y = c ipath = [ 0, -1, 1, -1, 2, 0 ]
would imply an alignment of abc --c
func SeqPairNormalize ¶
Types ¶
type Memz ¶
func (*Memz) AlignDelta ¶
func (*Memz) DebugPrint ¶
func (x *Memz) DebugPrint()
func (*Memz) DebugPrintCache ¶
func (x *Memz) DebugPrintCache()