Documentation ¶
Overview ¶
Package bgzf implements BGZF format reading and writing according to the SAM specification.
The specification is available at https://github.com/samtools/hts-specs.
Index ¶
Examples ¶
Constants ¶
const ( BlockSize = 0x0ff00 // The maximum size of an uncompressed input data block. MaxBlockSize = 0x10000 // The maximum size of a compressed output block. )
Variables ¶
var ( ErrClosed = errors.New("bgzf: use of closed writer") ErrCorrupt = errors.New("bgzf: corrupt block") ErrBlockOverflow = errors.New("bgzf: block overflow") ErrWrongFileType = errors.New("bgzf: file is a directory") ErrNoEnd = errors.New("bgzf: cannot determine offset from end") ErrNotASeeker = errors.New("bgzf: not a seeker") ErrContaminatedCache = errors.New("bgzf: cache owner mismatch") ErrNoBlockSize = errors.New("bgzf: could not determine block size") ErrBlockSizeMismatch = errors.New("bgzf: unexpected block size") )
Functions ¶
Types ¶
type Block ¶
type Block interface { // Base returns the file offset of the start of // the gzip member from which the Block data was // decompressed. Base() int64 io.Reader io.ByteReader // Used returns whether one or more bytes have // been read from the Block. Used() bool // NextBase returns the expected position of the next // BGZF block. It returns -1 if the Block is not valid. NextBase() int64 // contains filtered or unexported methods }
Block wraps interaction with decompressed BGZF data blocks.
type Cache ¶
type Cache interface { // Get returns the Block in the Cache with the specified // base or a nil Block if it does not exist. The returned // Block must be removed from the Cache. Get(base int64) Block // Put inserts a Block into the Cache, returning the Block // that was evicted or nil if no eviction was necessary and // a boolean indicating whether the put Block was retained // by the Cache. Put(Block) (evicted Block, retained bool) // Peek returns whether a Block exists in the cache for the // given base. If a Block satisfies the request, then exists // is returned as true with the offset for the next Block in // the stream, otherwise false and -1. Peek(base int64) (exists bool, next int64) }
Cache is a Block caching type. Basic cache implementations are provided in the cache package. A Cache must be safe for concurrent use.
If a Cache is a Wrapper, its Wrap method is called on newly created blocks.
type Reader ¶
type Reader struct { gzip.Header // Blocked specifies the behaviour of the // Reader at the end of a BGZF member. // If the Reader is Blocked, a Read that // reaches the end of a BGZF block will // return io.EOF. This error is not sticky, // so a subsequent Read will progress to // the next block if it is available. Blocked bool // contains filtered or unexported fields }
Reader implements BGZF blocked gzip decompression.
func NewReader ¶
NewReader returns a new BGZF reader.
The number of concurrent read decompressors is specified by rd. If rd is 0, GOMAXPROCS concurrent will be created. The returned Reader should be closed after use to avoid leaking resources.
func (*Reader) BlockLen ¶
BlockLen returns the number of bytes remaining to be read from the current BGZF block.
func (*Reader) LastChunk ¶
LastChunk returns the region of the BGZF file read by the last successful read operation or the resulting virtual offset of the last successful seek operation.
func (*Reader) ReadByte ¶ added in v1.4.0
ReadByte implements the io.ByteReader interface.
Example ¶
package main import ( "bytes" "fmt" "io" "log" "os" "github.com/biogo/hts/bgzf" ) func main() { // Write Tom Sawyer into a bgzf buffer. var buf bytes.Buffer w := bgzf.NewWriter(&buf, 1) f, err := os.Open("testdata/Mark.Twain-Tom.Sawyer.txt") if err != nil { log.Fatalf("failed to open file: %v", err) } defer f.Close() _, err = io.Copy(w, f) if err != nil { log.Fatalf("failed to copy file: %v", err) } err = w.Close() if err != nil { log.Fatalf("failed to close bgzf writer: %v", err) } // The text to search for. const line = `"It ain't any use, Huck, we're wrong again."` // Read the data until the line is found and output the line // number and bgzf.Chunk corresponding to the lines position // in the compressed data. r, err := bgzf.NewReader(&buf, 1) if err != nil { log.Fatal(err) } var n int for { n++ b, chunk, err := readLine(r) if err != nil { if err == io.EOF { break } log.Fatal(err) } // Make sure we trim the trailing newline. if bytes.Equal(bytes.TrimSpace(b), []byte(line)) { fmt.Printf("line:%d chunk:%+v\n", n, chunk) break } } } // readLine returns a line terminated by a '\n' and the bgzf.Chunk that contains // the line, including the newline character. If the end of file is reached before // a newline, the unterminated line and corresponding chunk are returned. func readLine(r *bgzf.Reader) ([]byte, bgzf.Chunk, error) { tx := r.Begin() var ( data []byte b byte err error ) for { b, err = r.ReadByte() if err != nil { break } data = append(data, b) if b == '\n' { break } } chunk := tx.End() return data, chunk, err }
Output: line:5986 chunk:{Begin:{File:112534 Block:11772} End:{File:112534 Block:11818}}
type Tx ¶
type Tx struct {
// contains filtered or unexported fields
}
Tx represents a multi-read transaction.
type Writer ¶
Writer implements BGZF blocked gzip compression.
Because the SAM specification requires that the RFC1952 FLG header field be set to 0x04, a Writer's Name and Comment fields should not be set if its output is to be read by another BGZF decompressor implementation.
func NewWriter ¶
NewWriter returns a new Writer. Writes to the returned writer are compressed and written to w.
The number of concurrent write compressors is specified by wc.
func NewWriterLevel ¶
NewWriterLevel returns a new Writer using the specified compression level instead of gzip.DefaultCompression. Allowable level options are integer values between between gzip.BestSpeed and gzip.BestCompression inclusive.
The number of concurrent write compressors is specified by wc.
func (*Writer) Close ¶
Close closes the Writer, waiting for any pending writes before returning the final error of the Writer.
func (*Writer) Flush ¶
Flush writes unwritten data to the underlying io.Writer. Flush does not block.
func (*Writer) Next ¶
Next returns the index of the start of the next write within the decompressed data block.
func (*Writer) Wait ¶
Wait waits for all pending writes to complete and returns the subsequent error state of the Writer.
Directories ¶
Path | Synopsis |
---|---|
Package cache provides basic block cache types for the bgzf package.
|
Package cache provides basic block cache types for the bgzf package. |
Package index provides common code for CSI and tabix BGZF indexing.
|
Package index provides common code for CSI and tabix BGZF indexing. |