bgen

package module
v0.0.0-...-fb50bb0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2024 License: BSD-3-Clause Imports: 15 Imported by: 7

README

bgen

BGEN is a BGEN parser for golang. It can read files in the .bgen format, but it cannot write them.

This package supports the most common use-cases for BGEN specifications 1.1, 1.2, and 1.3. It does not yet support phased data.

Installation

go get github.com/carbocation/bgen

Requirements

For BGEN specifications 1.1, 1.2, and 1.3 this package is immediately usable after go get. Only unphased samples are correctly supported currently.

API

The API is under active development and the public API may change for now.

For the current API, please see the BGEN Godoc

Documentation

Index

Constants

View Source
const BGENVersion = "1.2"

BGENVersion is the supported version of the BGEN file format

View Source
const MagicNumber = "bgen"

MagicNumber contains the value required to confirm that a file is BGEN-conformant

Variables

This section is empty.

Functions

func Choose

func Choose(n, k int) int

Choose k from n items can be done in this many ways. Originally derived from github.com/limix/bgen /src/util/choose.c

func DecompressZStandard

func DecompressZStandard(dst, src []byte) ([]byte, error)

DecompressZStandard decompresses Zstd compressed data for bgen13. As per the original, "Decompress src into dst. If you have a buffer to use, you can pass it to prevent allocation. If it is too small, or if nil is passed, a new buffer will be allocated and returned."

func WhichSQLiteDriver

func WhichSQLiteDriver() string

Types

type Allele

type Allele string

func (Allele) String

func (a Allele) String() string

type BGEN

type BGEN struct {
	FilePath         string                  // TODO: Make private, expose fully resolved path by method?
	File             genomisc.ReaderAtCloser // TODO: Make private, expose by method (if at all)?
	NVariants        uint32                  // TODO: Make private, expose by method?
	NSamples         uint32                  // TODO: Make private, expose by method?
	FlagCompression  Compression
	FlagLayout       Layout
	FlagHasSampleIDs bool
	SamplesStart     uint32 // TODO: Make private, expose by method (if at all)?
	VariantsStart    uint32 // TODO: Make private, expose by method (if at all)?
}

BGEN is the main object used for parsing BGEN files

func Open

func Open(path string) (*BGEN, error)

Open attempts to read a bgen file located at path. If successful, this returns a new BGEN object. Otherwise, it returns an error. Note that *os.File trivially satisfies genomisc.ReaderAtCloser, so an *os.File can be provided. If the path starts with gs://, then we assume that this is a Google Storage object and will attempt to read it with your default credentials.

func OpenFromGoogleStorageWithContext

func OpenFromGoogleStorageWithContext(b *BGEN, ctx context.Context) (*BGEN, error)

func (*BGEN) Close

func (b *BGEN) Close() error

func (*BGEN) NewVariantReader

func (b *BGEN) NewVariantReader() *VariantReader

type BGIIndex

type BGIIndex struct {
	DB       *sqlx.DB
	Metadata *BGIMetadata
}

func OpenBGI

func OpenBGI(path string) (*BGIIndex, error)

func (*BGIIndex) Close

func (b *BGIIndex) Close() error

type BGIMetadata

type BGIMetadata struct {
	Filename           string
	FileSize           uint   `db:"file_size"`
	LastWriteTime      Time   `db:"last_write_time"`
	FirstThousandBytes []byte `db:"first_1000_bytes"`
	IndexCreationTime  Time   `db:"index_creation_time"`
}

BGIMetadata conforms to the data found in the rows of the SQLite table "Metadata" from more recent versions of BGEN.

type Compression

type Compression uint32

Compression indicates how (and whether) the SNP block probability is compressed

const (
	CompressionDisabled Compression = iota
	CompressionZLIB
	CompressionZStandard
)

func (Compression) String

func (c Compression) String() string

type Layout

type Layout uint32

Layout is a versioned variant structured outlined by the BGEN spec

const (
	Layout1 Layout = iota
	Layout2
)

func (Layout) String

func (l Layout) String() string

type Sample

type Sample struct {
	SampleID string
}

func ReadSamples

func ReadSamples(b *BGEN) ([]Sample, error)

type SampleProbability

type SampleProbability struct {
	Missing       bool
	Ploidy        uint8 // Limited to 0-63
	Probabilities []float64
}

SampleProbability represents the variant data for one specfific individual at one specific locus, including information on whether this data is missing, what that individual's ploidy is, and then either (1) the probabilities for the phased haplotype or (2) the probabilies for the genotypes.

type Time

type Time time.Time

Time exists to facilitate time parsing from the Metadata, because BGEN uses both unixtime and text strings to represent time. Derived from https://github.com/mattn/go-sqlite3/issues/190#issuecomment-343341834f

func (*Time) Scan

func (t *Time) Scan(v interface{}) error

type Variant

type Variant struct {
	// Set up front
	ID         string
	RSID       string
	Chromosome string
	Position   uint32
	NSamples   uint32 // Populated only in Layout1
	NAlleles   uint16
	Alleles    []Allele

	// Conditional based on Layout
	MinimumPloidy       uint8
	MaximumPloidy       uint8
	Phased              bool
	NProbabilityBits    uint8
	SampleProbabilities []SampleProbability
}

type VariantIndex

type VariantIndex struct {
	Chromosome        string
	Position          uint32
	RSID              string `db:"rsid"`
	NAlleles          uint16 `db:"number_of_alleles"`
	Allele1           Allele
	Allele2           Allele
	FileStartPosition uint `db:"file_start_position"`
	SizeInBytes       uint `db:"size_in_bytes"`
}

VariantIndex conforms to the data found in the rows of the SQLite table "Variant" from BGEN Index (.bgi) files, and can be easily parsed with sqlx.

type VariantReader

type VariantReader struct {
	VariantsSeen uint32
	// contains filtered or unexported fields
}

func (*VariantReader) Error

func (vr *VariantReader) Error() error

func (*VariantReader) Read

func (vr *VariantReader) Read() *Variant

Read extracts the next variant and its genotype probabilities from the bitstream. If there are no variants left to read, Read returns nil. If there is a true error, Read populates the error value on the VariantReader, which can be read by calling the Error() method on the VariantReader.

func (*VariantReader) ReadAt

func (vr *VariantReader) ReadAt(byteOffset int64) *Variant

ReadAt extracts the variant and its genotype probabilities from the bitstream at the specified offset. Otherwise, it behaves like Read().

Directories

Path Synopsis
example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL