Documentation
¶
Overview ¶
Package vcfgo implements a Reader and Writer for variant call format. It eases reading, filtering modifying VCF's even if they are not to spec. Example:
f, _ := os.Open("examples/test.auto_dom.no_parents.vcf") rdr, err := vcfgo.NewReader(f) if err != nil { panic(err) } for { variant := rdr.Read() if variant == nil { break } fmt.Printf("%s\t%d\t%s\t%s\n", variant.Chromosome, variant.Pos, variant.Ref, variant.Alt) fmt.Printf("%s", variant.Info["DP"].(int) > 10) sample := variant.Samples[0] // we can get the PL field as a list (-1 is default in case of missing value) fmt.Println("%s", variant.GetGenotypeField(sample, "PL", -1)) _ = sample.DP } fmt.Fprintln(os.Stderr, rdr.Error())
Example ¶
package main import ( "fmt" "os" "github.com/brentp/vcfgo" ) func main() { f, _ := os.Open("examples/test.auto_dom.no_parents.vcf") rdr, err := vcfgo.NewReader(f, false) if err != nil { panic(err) } for { variant := rdr.Read() if variant == nil { break } fmt.Printf("%s\t%d\t%s\t%s\n", variant.Chromosome, variant.Pos, variant.Ref(), variant.Alt()) dp, _ := variant.Info().Get("DP") fmt.Printf("%v", dp.(int) > 10)
Output:
Index ¶
- Constants
- Variables
- func ItoS(k string, v interface{}) string
- type Header
- type Info
- type InfoByte
- func (i *InfoByte) Add(key string, value interface{})
- func (i InfoByte) Bytes() []byte
- func (i InfoByte) Contains(key string) bool
- func (i *InfoByte) Delete(key string)
- func (i InfoByte) Get(key string) (interface{}, error)
- func (i InfoByte) Keys() []string
- func (i InfoByte) SGet(key string) []byte
- func (i *InfoByte) Set(key string, value interface{}) error
- func (i InfoByte) String() string
- func (i *InfoByte) UpdateHeader(key string, value interface{})
- type KV
- type MetaLine
- type MetaType
- type Reader
- func (vr *Reader) AddFormatToHeader(id string, num string, stype string, desc string)
- func (vr *Reader) AddInfoToHeader(id string, num string, stype string, desc string)
- func (vr *Reader) Clear()
- func (vr *Reader) Close() error
- func (vr *Reader) Error() error
- func (vr *Reader) GetHeaderType(field string) string
- func (vr *Reader) Parse(fields [][]byte) *Variant
- func (vr *Reader) Read() *Variant
- type SampleFormat
- type SampleGenotype
- type VCFError
- type Variant
- func (v *Variant) Alt() []string
- func (v *Variant) CIEnd() (uint32, uint32, bool)
- func (v *Variant) CIPos() (uint32, uint32, bool)
- func (v *Variant) Chrom() string
- func (v *Variant) End() uint32
- func (v *Variant) GetGenotypeField(g *SampleGenotype, field string, missing interface{}) (interface{}, error)
- func (v *Variant) Id() string
- func (v *Variant) Info() interfaces.Info
- func (v *Variant) Ref() string
- func (v *Variant) Start() uint32
- func (v *Variant) String() string
- type Writer
Examples ¶
Constants ¶
const MISSING_VAL = 256
used for the quality score which is 0 to 255, but allows "."
Variables ¶
Functions ¶
Types ¶
type Header ¶
type Header struct { // Added by composition so functions which take this type as a // receiver will work with a Header as receiver. sync.RWMutex // Mandatory first header line for all VCFfiles. FileFormat string // Parsed from #CHROM line. SampleNames []string // This holds an array of meat-information lines // in the order in which they were observed in the original header. // It does not hold the fileformat meta line which is parsed // separately and nor does it hold the #CHROM line. Lines []*MetaLine // I think these are all headed to the scrap heap once I have the // Structured and Unstructured lists (maps?) working. Infos map[string]*Info SampleFormats map[string]*SampleFormat Filters map[string]string Extras []string // Contigs is a list of maps of length, URL, etc. Contigs []map[string]string // ##SAMPLE Samples map[string]string Pedigrees []string }
Header holds a heap of valuable annotation without which it is very difficult to make sense of the variant records. While the meta-information lines are all optional (except for fileformt), a VCF without a substantial header is very difficult to use. At the absolute minimum the header should contain a FILTER line for each string used in the FILTER field (column 6, 0-based numbering), an INFO line for each element in the INFO field (column 7) and a FORMAT line for each element in the FORMAT field (column 8).
func NewHeader ¶
func NewHeader() *Header
NewHeader returns a Header with the requisite allocations.
func (*Header) GetLineByTypeAndId ¶
Returns all MetaLines in the Header that match the supplied type, e.g. `INFO`, `FORMAT`, `fileDate` and that have the supplied ID. Note that this will only work for structured meta-info lines and that by definition, within a type, there can only be one record with a given ID so an error is thrown if more than one MetaLine is found. Also note that the type and ID matching are both case sensitive. Also note that the return type is a pointer to the MetaLine held by the Header so if you change it, you change the original.
func (*Header) GetLinesByType ¶
Returns all MetaLines in the Header that match the supplied type, e.g. `INFO`, `FORMAT`, `fileDate`. Note that the type matching is case sensitive so `info` and `INFO` are not interchangeable. Also note that the array returned is of pointers to the MetaLines held by the Header so if you change them, you change the originals.
func (*Header) ParseSamples ¶
Force parsing of the sample fields.
type Info ¶
type Info struct { Id string Description string Number string // A G R . ” Type string // STRING INTEGER FLOAT FLAG CHARACTER UNKNOWN // contains filtered or unexported fields }
Info holds the Info and Format fields
func NewInfoFromString ¶
NewInfoFromString parses a key=value string and returns a *Info. Note that the string is not a full INFO line from the header but just that portion between < and > in the INFO line. For example `ID=DP,Number=1,Type=Integer,Description="Total Depth"` not `##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">`
func (*Info) GetKV ¶
GetKV returns the KV for a given key. If the key does not exist, an ErrKeyNotFound error is returned.
type InfoByte ¶
type InfoByte struct { Info []byte // contains filtered or unexported fields }
InfoByte holds the INFO field in a Variant record line. Called by Reader as each variant record in the VCF file is parsed.
func NewInfoByte ¶
func (*InfoByte) UpdateHeader ¶
type KV ¶
type KV struct { Key string Value string // 0-based index of where this KV appeared in the original // string. It is used to recreate meta lines as strings with the // key=value pairs in the same order as they were in the original. Index int // Quote character ([`'"]) if any that was used for the value of the // key-value pair. The spec does not state that double quotes must // be used for all quoting but it may be so. In any case, we can cope // with any of the 3 quoting characters shown above. Quote is empty // if the Value was not quoted. Quote rune }
KV holds a key=value pair. It can be used for structured meta-information lines such as INFO, FORMAT and FILTER. See section 1.4 from the VCFv4.3 specification (version 27 Jul 2021; retrieved 2021-09-05) at: https://samtools.github.io/hts-specs/VCFv4.3.pdf
type MetaLine ¶
type MetaLine struct { LineNumber int // MetaType defaults to Unstructured. You can manually set this // value but it's best not to. Let the package do the work. MetaType MetaType // The basic XXX= value which is present in both STructured and // Unstructured MetaLines. LineKey string // Value is only used in Unstructured MetaLines - STructured // MetaLines use KVs and Order instead. Value string // KVs and Order contain the key=value items (as KV) from a // Structured MetaLine plus the order in which they occurred in the // OgString or the order in which they were added with AddKV(). // The Order is obeyed by String() KVs map[string]*KV Order []string // OgString is only available if the MetaLine was created via // NewMetaLineFromString(). OgString string }
MetaLine is designed to hold information from both structured and unstructured meta information lines from the VCF header. KVs and Order will only be set for structured lines and Value will only be set for unstructured lines. different fields set for the different MetaTypes.
func NewMetaLine ¶
func NewMetaLine() *MetaLine
NewMetaLine returns a pointer to a MetaLine. By default, the MetaType is Unstructured. If you use the AddKV() function, MetaType will be automatically converted to Structured.
func NewMetaLineFromString ¶
NewMetaLineFromString matches the input string against the pattern for Structured and Unstructured MetaLines and returns a MetaLine. If neither pattern matches, it throws an error.
type MetaType ¶
type MetaType int
MetaType - Create enum for header meta information line type.
Declare related constants for each MetaType starting with index 1
type Reader ¶
Reader holds information about the current line number (for errors) and The VCF header that indicates the structure of records.
func NewReader ¶
NewReader returns a Reader. If lazySamples is true, then the user will have to call Reader.ParseSamples() in order to access simple info.
func (*Reader) AddFormatToHeader ¶
AddFormatToHeader adds a FORMAT field to the header.
func (*Reader) AddInfoToHeader ¶
AddInfoToHeader adds a INFO field to the header.
func (*Reader) GetHeaderType ¶
type SampleFormat ¶
type SampleFormat Info
SampleFormat holds the type info for Format fields.
func (*SampleFormat) GetKV ¶
func (s *SampleFormat) GetKV(k string) (*KV, error)
GetKV returns the KV for a given key. If the key does not exist, an ErrKeyNotFound error is returned.
func (*SampleFormat) GetValue ¶
func (s *SampleFormat) GetValue(k string) (string, error)
GetValue returns the value for a given key. If the key does not exist, an ErrKeyNotFound error is returned.
func (*SampleFormat) String ¶
func (s *SampleFormat) String() string
String returns a string representation.
type SampleGenotype ¶
type SampleGenotype struct { Phased bool GT []int DP int GL []float64 GQ int MQ int Fields map[string]string }
SampleGenotype holds the information about a sample. Several fields are pre-parsed, but all fields are kept in Fields as well.
func NewSampleGenotype ¶
func NewSampleGenotype() *SampleGenotype
NewSampleGenotype allocates the internals and returns a *SampleGenotype
func (*SampleGenotype) AltDepths ¶
func (s *SampleGenotype) AltDepths() ([]int, error)
AltDepths returns the depths of the alternates for this sample
func (*SampleGenotype) RefDepth ¶
func (s *SampleGenotype) RefDepth() (int, error)
RefDepth returns the depths of the alternates for this sample
func (*SampleGenotype) String ¶
func (sg *SampleGenotype) String(fields []string) string
String returns the string representation of the sample field.
type VCFError ¶
VCFError satisfies the error interface and allows multiple errors. This is useful because, for example, on a single line, every sample may have a field that doesn't match the description in the header. We want to keep parsing but also let the caller know about the error.
func (*VCFError) Add ¶
Add adds an error and the line number within the vcf where the error took place.
type Variant ¶
type Variant struct { Chromosome string Pos uint64 Id_ string Reference string Alternate []string Quality float32 Filter string Info_ interfaces.Info Format []string Samples []*SampleGenotype Header *Header LineNumber int // contains filtered or unexported fields }
Variant holds the information about a single site. It is analagous to a row in a VCF file.
func (*Variant) CIEnd ¶
CIEnd reports the Left and Right end of an SV using the CIEND tag. It is in bed format so the end is +1'ed. E.g. If there is no CIEND, the return value is v.End() - 1, v.End()
func (*Variant) CIPos ¶
CIPos reports the Left and Right end of an SV using the CIPOS tag. It is in bed format so the end is +1'ed. E.g. If there is not CIPOS, the return value is v.Start(), v.Start() + 1
func (*Variant) GetGenotypeField ¶
func (v *Variant) GetGenotypeField(g *SampleGenotype, field string, missing interface{}) (interface{}, error)
GetGenotypeField uses the information from the header to parse the correct time from a genotype field. It returns an interface that can be asserted to the expected type.
func (*Variant) Info ¶
func (v *Variant) Info() interfaces.Info