bwt

package
v0.31.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 21, 2024 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package bwt is a package for performing burrows-wheeler transforms on sequences.

The BWT is a lossless compression algorithm that can be used to reduce the memory footprint of a sequence while still maintaining the ability to search, align, and extract the original sequence. This is useful for sequences so large that it would be beneficial to reduce its memory footprint while also maintaining a way to analyze and work with the sequence. BWT is used in both bioinformatics(burrows wheeler alignment) and data compression (bzip2).

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BWT

type BWT struct {
	// contains filtered or unexported fields
}

BWT Burrows-Wheeler Transform Compresses and Indexes a given sequence so that it can be be used for search, alignment, and text extraction. This is useful for sequences so large that it would be beneficial to reduce its memory footprint while also maintaining a way to analyze and work with the sequence.

Example (Basic)

This example shows how BWT can be used for exact pattern matching by returning the offsets at which the pattern exists. This can be useful for alignment when you need need to reduce the memory footprint of a reference sequence without loosing any data since BWT is a lossless compression.

package main

import (
	"fmt"
	"log"

	"github.com/bebop/poly/search/bwt"
	"golang.org/x/exp/slices"
)

func main() {
	inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG"

	bwt, err := bwt.New(inputSequence)
	if err != nil {
		log.Fatal(err)
	}

	offsets, err := bwt.Locate("GCC")
	if err != nil {
		log.Fatal(err)
	}
	slices.Sort(offsets)
	fmt.Println(offsets)
}
Output:

[5 17]

func New

func New(sequence string) (BWT, error)

New returns a BWT of the provided sequence The provided sequence must not contain the nullChar defined in this package. If it does, New will return an error.

func (BWT) Count

func (bwt BWT) Count(pattern string) (count int, err error)

Count represents the number of times the provided pattern shows up in the original sequence.

Example
package main

import (
	"fmt"
	"log"

	"github.com/bebop/poly/search/bwt"
)

func main() {
	inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG"

	bwt, err := bwt.New(inputSequence)
	if err != nil {
		log.Fatal(err)
	}

	count, err := bwt.Count("CG")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(count)
}
Output:

10

func (BWT) Extract

func (bwt BWT) Extract(start, end int) (extracted string, err error)

Extract this allows us to extract parts of the original sequence from the BWT. start is the beginning of the range of text to extract inclusive. end is the end of the range of text to extract exclusive. If either start or end are out of bounds, Extract will panic.

Example
package main

import (
	"fmt"
	"log"

	"github.com/bebop/poly/search/bwt"
)

func main() {
	inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG"

	bwt, err := bwt.New(inputSequence)
	if err != nil {
		log.Fatal(err)
	}

	extracted, err := bwt.Extract(48, 54)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(extracted)
}
Output:

AACGTG

func (BWT) GetTransform

func (bwt BWT) GetTransform() string

GetTransform returns the last column of the BWT transform of the original sequence.

Example
package main

import (
	"fmt"
	"log"

	"github.com/bebop/poly/search/bwt"
)

func main() {
	inputSequence := "banana"

	bwt, err := bwt.New(inputSequence)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(bwt.GetTransform())
}
Output:

annb$aa

func (BWT) Len

func (bwt BWT) Len() int

Len return the length of the sequence used to build the BWT

func (BWT) Locate

func (bwt BWT) Locate(pattern string) (offsets []int, err error)

Locate returns a list of offsets at which the beginning of the provided pattern occurs in the original sequence.

Example
package main

import (
	"fmt"
	"log"

	"github.com/bebop/poly/search/bwt"
	"golang.org/x/exp/slices"
)

func main() {
	inputSequence := "AACCTGCCGTCGGGGCTGCCCGTCGCGGGACGTCGAAACGTGGGGCGAAACGTG"

	bwt, err := bwt.New(inputSequence)
	if err != nil {
		log.Fatal(err)
	}

	offsets, err := bwt.Locate("CG")
	if err != nil {
		log.Fatal(err)
	}
	slices.Sort(offsets)
	fmt.Println(offsets)
}
Output:

[7 10 20 23 25 30 33 38 45 50]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL