Documentation ¶
Index ¶
- Variables
- func RecycleGenome(g *Genome)
- func RecycleTwoBit(b2 *[]byte)
- func Seq2TwoBit(s []byte) *[]byte
- func TwoBit2Seq(b2 []byte, bases int) ([]byte, error)
- type Genome
- type Reader
- func (r *Reader) Close() error
- func (r *Reader) GenomeInfo(idx int) (*Genome, error)
- func (r *Reader) Seq(idx int) (*Genome, error)
- func (r *Reader) SubSeq(idx int, start int, end int) (*Genome, error)
- func (r *Reader) SubSeq2(idx int, seqid []byte, start int, end int) (*Genome, int, error)
- func (r *Reader) SubSeq3(idx int, start int, end int, g *Genome) (*Genome, error)
- type Writer
Constants ¶
This section is empty.
Variables ¶
var BufferSize = 65536 // os.Getpagesize()
BufferSize is size of reading and writing buffer
var ErrBrokenFile = errors.New("genome data: broken file")
ErrBrokenFile means the file is not complete.
var ErrEmptySeq = errors.New("genome data: empty seq")
ErrEmptySeq means the sequence is empty
var ErrInvalidFileFormat = errors.New("genome data: invalid binary format")
ErrInvalidFileFormat means invalid file format.
var ErrInvalidTwoBitData = errors.New("genome data: invalid two-bit data")
ErrInvalidTwoBitData means the length of two bit seq slice does not match the number of bases
var ErrVersionMismatch = errors.New("genome data: version mismatch")
ErrVersionMismatch means version mismatch between files and program
var GenomeIndexFileExt = ".idx"
KVIndexFileExt is the file extension of k-mer data index file.
var Magic = [8]byte{'.', 'g', 'e', 'n', 'o', 'm', 'e', 's'}
Magic number for checking file format
var MagicIdx = [8]byte{'.', 'g', 'e', 'n', 'o', 'm', 'e', 'i'}
Magic number for the index file
var MainVersion uint8 = 0
MainVersion is use for checking compatibility
var MinorVersion uint8 = 1
MinorVersion is less important
var PoolGenome = &sync.Pool{New: func() interface{} { return &Genome{ ID: make([]byte, 0, 128), Seq: make([]byte, 0, 20<<20), GenomeSize: 0, SeqSizes: make([]int, 0, 128), Done: make(chan int), } }}
PoolGenome is the object pool for Genome
Functions ¶
func Seq2TwoBit ¶
Seq2TwoBit converts a DNA sequence to 2bit-packed sequence.
Types ¶
type Genome ¶
type Genome struct { ID []byte // genome ID Seq []byte // sequence, bases GenomeSize int // bases of all sequences Len int // length of contatenated sequences NumSeqs int // number of sequences SeqSizes []int // sizes of sequences SeqIDs []*[]byte // IDs of all sequences // only used in index building Kmers *[]uint64 // lexichash mask result Locses *[][]int // lexichash mask result TwoBit *[]byte // bit-packed sequence StartTime time.Time GenomeIdx int // only for collecting Batch+Genome Index of split genome chunks, not saved in index // seed positions to write to the file Locs *[]uint32 ExtraKmers *[]*[]uint64 // 3*n. (kmer, loc) // for making sure both genome and key-value data being written Done chan int // offset of sequence, only used in calling SubSeq for more than once SeqOffSet int64 }
Genome represents a reference sequence to insert and a matched subsequence
type Reader ¶
type Reader struct { Index []uint64 // index data of all genome records, (offset, nbases) // contains filtered or unexported fields }
Reader is for fast extracting of subsequence of any sequence in the data file.
func NewReader ¶
NewReader returns a reader from a genome file. The reader is recycled after calling Close().
func (*Reader) GenomeInfo ¶ added in v0.4.0
GenomeInfo returns the genome information of a genome (idx is 0-based), Please call RecycleGenome() after using the result.
func (*Reader) SubSeq ¶
SubSeq returns the subsequence of a genome (idx is 0-based), from start to end (both are 0-based and included). Please call RecycleGenome() after using the result.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer saves a list of DNA sequences into 2bit-encoded format, along with its genome information.