memz

package
v0.0.0-...-7a0a068 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: AGPL-3.0, AGPL-3.0 Imports: 4 Imported by: 1

README

memz

Documentation

Index

Constants

View Source
const (
	DIFF = iota
	GAPA = iota
	GAPB = iota
	SEQ  = iota
)

Variables

View Source
var GAP int
View Source
var Score [][]int

Functions

func BlockAlign

func BlockAlign(a, b []byte)

This is a heuristic alignment. This does not do global alignemtn but does a hybrid.

For two given sequences, split it into 'blocks', where the blocks are of variable size and are marked by checkpoints in a rolling hash.

The rolling hash checkpoints are marked when the low order bits are all zero of a Rabin fingerprint. The number of low order bits for the fingerprint is taken to be:

1 + floor( sqrt( min(length(a), length(b)) ) )

When a sub sequence is marked by it's endpoints in the rolling hash, the CRC32 checksum is calculated and stored.

The two sequences checksums are then compared and an attempt to align matching checksums is made on the assumption that both sequences are mostly the same with some minor alterations in the middle.

Runs of sub sequence that don't have aligned checksums are recursively passed into Memz which a lower threshold set to mark when true global alignment should be performed.

If the mismatch run is found to be the length of the sequence, this function fails.

If the checksum alignment step fails, this function fails.

**THIS IS NOT GLOBAL ALIGNEMT**. This is a heuristic. This is primarily meant to be run on long strings that are relatively similar to each other. Though this isn't global alignemtn, some applications may find the alignment produced (if this heuristic succeeds) good enoug.

func Hirschberg

func Hirschberg(x, y []byte) (a, b []byte, sc int)

Mostly a wrapper for HirschbergRecur. HirschbergRecur returns an array and score, with even elements of the array being the 'x' positions in the implied dynamic programming matrix and the odd positions being the implied 'y' positions of the implied dynamic programming matrix.

e.g. x = abc y = c ipath = [ 0, -1, 1, -1, 2, 0 ]

would imply an alignment of abc --c

Hirschberg recur constructs the two strings (with '-' as the gap character) and the score.

func HirschbergRecur

func HirschbergRecur(x, y []byte, basec, baser int) ([]int, int)

Hirschberg's algorithm for dynamic programming. Hirschberg's algorithm finds the optimal score along with the path in linear space and quadratic time.

The implicit DP matrix can be thought of as having the x bytes along the columns and y bytes along the rows with a gap ('-') prefix character attached to each. e.g. for x = 'ox', y = 'fox':

     x0 x1
   -  o  x
-  .  .  .

y0 f . . . y1 o . . . y2 x . . .

The Hirschberg algorithm works by finding a midpoint where the path must pass through then recursively applying the Hirschberg on the upper left and lower right sub problem. The midpoint is calculated by keeping the column of scores for the upper left and lower right sub problem then choosing the appropriate value.

Each midpoint can be stored and constitues only an additional linear amount of space to store the path.

When the sub problem becomes small enough (for example, when |x| or |y| <= 2) vanilla dynamic programming can be applied.

HirschbergRecur returns an array and score, with even elements of the array being the 'x' positions in the implied dynamic programming matrix and the odd positions being the implied 'y' positions of the implied dynamic programming matrix.

e.g. x = abc y = c ipath = [ 0, -1, 1, -1, 2, 0 ]

would imply an alignment of abc --c

func SeqPairNormalize

func SeqPairNormalize(seq_a, seq_b []byte) error

func SimpRev

func SimpRev(x, y []byte)

func Simp_b

func Simp_b(x, y []byte, basec, baser int) ([]int, int)

Simple dynamic programming. Construct an n = |x| by m = |y| matrix (n columns, m rows).

func Simp_b_old

func Simp_b_old(x, y []byte, basec, baser int) ([]int, int)

Types

type Diff

type Diff struct {
	PosA int
	PosB int
	Type int
	Len  int
}

type Memz

type Memz struct {
	Cache       []int
	ScoreMatrix [256][256]int
	Gap         int
	GapA        int
	GapB        int

	N, M int
}

func New

func New() *Memz

func (*Memz) Align

func (x *Memz) Align(a, b []byte) ([]byte, []byte)

func (*Memz) AlignDelta

func (x *Memz) AlignDelta(a, b []byte) []Diff

func (*Memz) DebugPrint

func (x *Memz) DebugPrint()

func (*Memz) DebugPrintCache

func (x *Memz) DebugPrintCache()

func (*Memz) Init

func (x *Memz) Init()

func (*Memz) Score

func (x *Memz) Score(a, b []byte) int

func (*Memz) SetScore

func (x *Memz) SetScore(score_matrix map[byte]map[byte]int)

Directories

Path Synopsis
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL