Documentation ¶
Overview ¶
Package bwt is a package for performing burrows-wheeler transforms on sequences.
The BWT is a lossless compression algorithm that can be used to reduce the memory footprint of a sequence while still maintaining the ability to search, align, and extract the original sequence. This is useful for sequences so large that it would be beneficial to reduce its memory footprint while also maintaining a way to analyze and work with the sequence. BWT is used in both bioinformatics(burrows wheeler alignment) and data compression (bzip2).
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BWT ¶
type BWT struct {
// contains filtered or unexported fields
}
BWT Burrows-Wheeler Transform Compresses and Indexes a given sequence so that it can be be used for search, alignment, and text extraction. This is useful for sequences so large that it would be beneficial to reduce its memory footprint while also maintaining a way to analyze and work with the sequence.
Example (Basic) ¶
This example shows how BWT can be used for exact pattern matching by returning the offsets at which the pattern exists. This can be useful for alignment when you need need to reduce the memory footprint of a reference sequence without loosing any data since BWT is a lossless compression.
package main import ( "fmt" "log" "github.com/bebop/poly/search/bwt" "golang.org/x/exp/slices" ) func main() { inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG" bwt, err := bwt.New(inputSequence) if err != nil { log.Fatal(err) } offsets, err := bwt.Locate("GCC") if err != nil { log.Fatal(err) } slices.Sort(offsets) fmt.Println(offsets) }
Output: [5 17]
func New ¶
New returns a BWT of the provided sequence The provided sequence must not contain the nullChar defined in this package. If it does, New will return an error.
func (BWT) Count ¶
Count represents the number of times the provided pattern shows up in the original sequence.
Example ¶
package main import ( "fmt" "log" "github.com/bebop/poly/search/bwt" ) func main() { inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG" bwt, err := bwt.New(inputSequence) if err != nil { log.Fatal(err) } count, err := bwt.Count("CG") if err != nil { log.Fatal(err) } fmt.Println(count) }
Output: 10
func (BWT) Extract ¶
Extract this allows us to extract parts of the original sequence from the BWT. start is the beginning of the range of text to extract inclusive. end is the end of the range of text to extract exclusive. If either start or end are out of bounds, Extract will panic.
Example ¶
package main import ( "fmt" "log" "github.com/bebop/poly/search/bwt" ) func main() { inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG" bwt, err := bwt.New(inputSequence) if err != nil { log.Fatal(err) } extracted, err := bwt.Extract(48, 54) if err != nil { log.Fatal(err) } fmt.Println(extracted) }
Output: AACGTG
func (BWT) GetTransform ¶
GetTransform returns the last column of the BWT transform of the original sequence.
Example ¶
package main import ( "fmt" "log" "github.com/bebop/poly/search/bwt" ) func main() { inputSequence := "banana" bwt, err := bwt.New(inputSequence) if err != nil { log.Fatal(err) } fmt.Println(bwt.GetTransform()) }
Output: annb$aa
func (BWT) Locate ¶
Locate returns a list of offsets at which the beginning of the provided pattern occurs in the original sequence.
Example ¶
package main import ( "fmt" "log" "github.com/bebop/poly/search/bwt" "golang.org/x/exp/slices" ) func main() { inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG" bwt, err := bwt.New(inputSequence) if err != nil { log.Fatal(err) } offsets, err := bwt.Locate("CG") if err != nil { log.Fatal(err) } slices.Sort(offsets) fmt.Println(offsets) }
Output: [7 10 20 23 25 30 33 38 45 50]