genome

package

v0.5.0 Latest Latest Go to latest Published: Dec 18, 2024 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/shenwei356/LexicMap

Documentation ¶

Index ¶

Variables
func RecycleGenome(g *Genome)
func RecycleTwoBit(b2 *[]byte)
func Seq2TwoBit(s []byte) *[]byte
func TwoBit2Seq(b2 []byte, bases int) ([]byte, error)
type Genome
- func (r *Genome) Reset()
- func (r Genome) String() string
type Reader
- func NewReader(file string) (*Reader, error)
type Writer
- func NewWriter(file string, batch uint32) (*Writer, error)
- func (w *Writer) Close() error
- func (w *Writer) Write(s *Genome) error

Constants ¶

This section is empty.

Variables ¶

View Source

var BufferSize = 65536 // os.Getpagesize()

BufferSize is size of reading and writing buffer

View Source

var ErrBrokenFile = errors.New("genome data: broken file")

ErrBrokenFile means the file is not complete.

View Source

var ErrEmptySeq = errors.New("genome data: empty seq")

ErrEmptySeq means the sequence is empty

View Source

var ErrInvalidFileFormat = errors.New("genome data: invalid binary format")

ErrInvalidFileFormat means invalid file format.

View Source

var ErrInvalidTwoBitData = errors.New("genome data: invalid two-bit data")

ErrInvalidTwoBitData means the length of two bit seq slice does not match the number of bases

View Source

var ErrVersionMismatch = errors.New("genome data: version mismatch")

ErrVersionMismatch means version mismatch between files and program

View Source

var GenomeIndexFileExt = ".idx"

KVIndexFileExt is the file extension of k-mer data index file.

View Source

var Magic = [8]byte{'.', 'g', 'e', 'n', 'o', 'm', 'e', 's'}

Magic number for checking file format

View Source

var MagicIdx = [8]byte{'.', 'g', 'e', 'n', 'o', 'm', 'e', 'i'}

Magic number for the index file

View Source

var MainVersion uint8 = 0

MainVersion is use for checking compatibility

View Source

var MinorVersion uint8 = 1

MinorVersion is less important

View Source

var PoolGenome = &sync.Pool{New: func() interface{} {
	return &Genome{
		ID:  make([]byte, 0, 128),
		Seq: make([]byte, 0, 20<<20),

		GenomeSize: 0,
		SeqSizes:   make([]int, 0, 128),

		Done: make(chan int),
	}
}}

PoolGenome is the object pool for Genome

Functions ¶

func Seq2TwoBit ¶

func Seq2TwoBit(s []byte) *[]byte

Seq2TwoBit converts a DNA sequence to 2bit-packed sequence.

func TwoBit2Seq ¶

func TwoBit2Seq(b2 []byte, bases int) ([]byte, error)

TwoBit2Seq converts a 2bit-packed sequence to DNA.

Types ¶

type Genome ¶

type Genome struct {
	ID  []byte // genome ID
	Seq []byte // sequence, bases

	GenomeSize int       // bases of all sequences
	Len        int       // length of contatenated sequences
	NumSeqs    int       // number of sequences
	SeqSizes   []int     // sizes of sequences
	SeqIDs     []*[]byte // IDs of all sequences

	// only used in index building
	Kmers     *[]uint64 // lexichash mask result
	Locses    *[][]int  // lexichash mask result
	TwoBit    *[]byte   // bit-packed sequence
	StartTime time.Time

	GenomeIdx int // only for collecting Batch+Genome Index of split genome chunks, not saved in index

	// seed positions to write to the file
	Locs       *[]uint32
	ExtraKmers *[]*[]uint64 // 3*n. (kmer, loc)

	// for making sure both genome and key-value data being written
	Done chan int

	// offset of sequence, only used in calling SubSeq for more than once
	SeqOffSet int64
}

Genome represents a reference sequence to insert and a matched subsequence

func (*Genome) Reset ¶

func (r *Genome) Reset()

Reset resets the Genome.

func (Genome) String ¶

func (r Genome) String() string

type Reader ¶

type Reader struct {
	Index []uint64 // index data of all genome records, (offset, nbases)
	// contains filtered or unexported fields
}

Reader is for fast extracting of subsequence of any sequence in the data file.

func NewReader ¶

func NewReader(file string) (*Reader, error)

NewReader returns a reader from a genome file. The reader is recycled after calling Close().

func (*Reader) Close ¶

func (r *Reader) Close() error

Close closes and recycles the reader.

func (*Reader) GenomeInfo ¶ added in v0.4.0

func (r *Reader) GenomeInfo(idx int) (*Genome, error)

GenomeInfo returns the genome information of a genome (idx is 0-based), Please call RecycleGenome() after using the result.

func (*Reader) Seq ¶

func (r *Reader) Seq(idx int) (*Genome, error)

Seq returns the sequence with index of genome (0-based).

func (*Reader) SubSeq ¶

func (r *Reader) SubSeq(idx int, start int, end int) (*Genome, error)

SubSeq returns the subsequence of a genome (idx is 0-based), from start to end (both are 0-based and included). Please call RecycleGenome() after using the result.

func (*Reader) SubSeq2 ¶

func (r *Reader) SubSeq2(idx int, seqid []byte, start int, end int) (*Genome, int, error)

SubSeq2 returns the subsequence of one genome (idx is 0-based), from start to end (both are 0-based and included). It also return the actual end position (0-based). Please call RecycleGenome() after using the result.

func (*Reader) SubSeq3 ¶ added in v0.5.0

func (r *Reader) SubSeq3(idx int, start int, end int, g *Genome) (*Genome, error)

SubSeq3 returns the subsequence of a genome (idx is 0-based), from start to end (both are 0-based and included). Please call RecycleGenome() after using the result.

type Writer ¶

type Writer struct {
	// contains filtered or unexported fields
}

Writer saves a list of DNA sequences into 2bit-encoded format, along with its genome information.

func NewWriter ¶

func NewWriter(file string, batch uint32) (*Writer, error)

NewWriter creates a new Writer. Batch is the batch id for this data file.

func (*Writer) Close ¶

func (w *Writer) Close() error

Close writes the index file and finishes the writing.

func (*Writer) Write ¶

func (w *Writer) Write(s *Genome) error

Write writes one genome. After calling this, you need to call RecycleGenome to recycle the genome.

Source Files ¶

View all Source files

genome.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

genome

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func RecycleGenome ¶

func RecycleTwoBit ¶

func Seq2TwoBit ¶

func TwoBit2Seq ¶

Types ¶

type Genome ¶

func (*Genome) Reset ¶

func (Genome) String ¶

type Reader ¶

func NewReader ¶

func (*Reader) Close ¶

func (*Reader) GenomeInfo ¶ added in v0.4.0

func (*Reader) Seq ¶

func (*Reader) SubSeq ¶

func (*Reader) SubSeq2 ¶

func (*Reader) SubSeq3 ¶ added in v0.5.0

type Writer ¶

func NewWriter ¶

func (*Writer) Close ¶

func (*Writer) Write ¶

Source Files ¶