Documentation
¶
Overview ¶
Package fai implements fasta sequence file index handling, including creating , reading and random accessing.
Code of fai data structure were copied and edited from [1].
But I wrote the code of creating and reading fai, and so did test code.
Code of random accessing subsequences were copied from [2], but I extended them a lot.
Reference:
[1]. https://github.com/biogo/biogo/blob/master/io/seqio/fai/fai.go
[2]. https://github.com/brentp/faidx/blob/master/faidx.go
## General Usage
import "github.com/shenwei356/bio/seqio/fai" file := "seq.fa" faidx, err := fai.New(file) checkErr(err) defer func() { checkErr(faidx.Close()) }() // whole sequence seq, err := faidx.Seq("cel-mir-2") checkErr(err) // single base s, err := faidx.Base("cel-let-7", 1) checkErr(err) // subsequence. start and end are all 1-based seq, err := faidx.SubSeq("cel-mir-2", 15, 19) checkErr(err)
## Extended SubSeq
For extended SubSeq, negative position is allowed.
This is my custom locating strategy. Start and end are all 1-based. To better understand the locating strategy, see examples below:
1-based index 1 2 3 4 5 6 7 8 9 10 negative index 0-9-8-7-6-5-4-3-2-1 seq A C G T N a c g t n 1:1 A 2:4 C G T -4:-2 c g t -4:-1 c g t n -1:-1 n 2:-2 C G T N a c g t 1:-1 A C G T N a c g t n 1:12 A C G T N a c g t n -12:-1 A C G T N a c g t n
Examples:
// last 12 bases seq, err := faidx.SubSeq("cel-mir-2", -12, -1) checkErr(err)
## Advanced Usage
Function `fai.New(file string)` is a wraper to simplify the process of creating and reading FASTA index . Let's see what's happened inside:
func New(file string) (*Faidx, error) { fileFai := file + ".fai" var index Index if _, err := os.Stat(fileFai); os.IsNotExist(err) { index, err = Create(file) if err != nil { return nil, err } } else { index, err = Read(fileFai) if err != nil { return nil, err } } return NewWithIndex(file, index) }
By default, sequence ID is used as key in FASTA index file. Inside the package, a regular expression is used to get sequence ID from full head. The default value is `^([^\s]+)\s?`, i.e. getting first non-space characters of head. So you can just use `fai.Create(file string)` to create .fai file.
If you want to use full head instead of sequence ID (first non-space characters of head), you could use `fai.CreateWithIDRegexp(file string, idRegexp string)` to create faidx. Here, the `idRegexp` should be `^(.+)$`. For convenience, you can use another function `CreateWithFullHead`.
## More Advanced Usages
Note that, ***by default, whole file is mapped into shared memory***, which is OK for small files (smaller than your RAM). For very big files, you should disable that. Instead, file seeking is used.
// change the global variable fai.MapWholeFile = false // then do other things
Index ¶
- Variables
- func SubLocation(length, start, end int) (int, int, bool)
- type Faidx
- func (f *Faidx) Base(chr string, pos int) (byte, error)
- func (f *Faidx) Close() error
- func (f *Faidx) Seq(chr string) ([]byte, error)
- func (f *Faidx) SeqNotCleaned(chr string) ([]byte, error)
- func (f *Faidx) SubSeq(chr string, start int, end int) ([]byte, error)
- func (f *Faidx) SubSeqNotCleaned(chr string, start int, end int) ([]byte, error)
- type Index
- type Record
Constants ¶
This section is empty.
Variables ¶
var ErrSeqNotExists = fmt.Errorf("sequence not exists")
ErrSeqNotExists means that sequence not exists
var IDRegexp = regexp.MustCompile(defaultIDRegexp)
IDRegexp is regexp for parsing record id
var MapWholeFile = true
MapWholeFile is a globle flag to decides whether map whole file
Functions ¶
func SubLocation ¶
SubLocation is my sublocation strategy, start, end and returned start and end are all 1-based
1-based index 1 2 3 4 5 6 7 8 9 10
negative index 0-9-8-7-6-5-4-3-2-1
seq A C G T N a c g t n 1:1 A 2:4 C G T -4:-2 c g t -4:-1 c g t n -1:-1 n 2:-2 C G T N a c g t 1:-1 A C G T N a c g t n 1:12 A C G T N a c g t n -12:-1 A C G T N a c g t n
Types ¶
type Faidx ¶
type Faidx struct { Index Index // contains filtered or unexported fields }
Faidx is
func NewWithCustomExt ¶
NewWithCustomExt try to get Faidx from fasta file, and .fai is specified
func NewWithIndex ¶
NewWithIndex return faidx from file and readed Index. Useful for using custom IDRegexp
func (*Faidx) SeqNotCleaned ¶
SeqNotCleaned returns sequences without cleaning "\r", and "\n"
type Index ¶
Index is FASTA index
func CreateWithFullHead ¶
CreateWithFullHead uses full head instead of just sequence ID
func CreateWithIDRegexp ¶
CreateWithIDRegexp uses custom regular expression to get sequence ID