Documentation ¶
Overview ¶
Package vcf is a library for for parsing, representing, and writing VCF files. See http://samtools.github.io/hts-specs/VCFv4.3.pdf
Index ¶
- Constants
- Variables
- func FormatFormatInformation(out *bufio.Writer, format *FormatInformation, infoNotFormat bool)
- func FormatMetaInformation(out *bufio.Writer, meta interface{})
- func FormatString(out io.ByteWriter, str string)
- func FormatVariants(out *bufio.Writer, variants []Variant)
- func SkipHeader(reader *bufio.Reader) (lines int)
- type FieldParser
- type FormatInformation
- type Genotype
- type Header
- type InputFile
- type MetaInformation
- type OutputFile
- type StringScanner
- func (sc *StringScanner) Len() int
- func (sc *StringScanner) ParseFormatCharacter() interface{}
- func (sc *StringScanner) ParseFormatFloat() interface{}
- func (sc *StringScanner) ParseFormatInformation() *FormatInformation
- func (sc *StringScanner) ParseFormatInteger() interface{}
- func (sc *StringScanner) ParseFormatString() interface{}
- func (sc *StringScanner) ParseGenericInfo() interface{}
- func (sc *StringScanner) ParseInfoCharacter() interface{}
- func (sc *StringScanner) ParseInfoFlag() interface{}
- func (sc *StringScanner) ParseInfoFloat() interface{}
- func (sc *StringScanner) ParseInfoInteger() interface{}
- func (sc *StringScanner) ParseInfoString() interface{}
- func (sc *StringScanner) ParseMetaField() (key, value string)
- func (sc *StringScanner) ParseMetaInformation() interface{}
- func (sc *StringScanner) ParseVariant(vp *VariantParser) Variant
- func (sc *StringScanner) Reset(s string)
- func (sc *StringScanner) SkipSpace()
- type Type
- type Variant
- type VariantParser
- type Vcf
Constants ¶
const ( VcfExt = ".vcf" GzExt = ".gz" )
The possible file extensions for VCF or gz-compressed VCF files
const ( FileFormatVersion = "VCFv4.3" FileFormatVersionLine = "##fileformat=VCFv4.3" )
The supported VCF file format version.
const ( NumberA int32 = -1 * (1 + iota) NumberR NumberG NumberDot InvalidNumber )
Constants for format information Number entries.
Variables ¶
Commonly used VCF entries.
var DefaultHeaderColumns = []string{"CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO"}
DefaultHeaderColumns for VCF files.
Functions ¶
func FormatFormatInformation ¶
func FormatFormatInformation(out *bufio.Writer, format *FormatInformation, infoNotFormat bool)
FormatFormatInformation outputs VCF info or format information
func FormatMetaInformation ¶
FormatMetaInformation outputs VCF meta information, which can be just a string or *MetaInformation
func FormatString ¶
func FormatString(out io.ByteWriter, str string)
FormatString outputs a string to a VCF file, adding necessary double quotes and escapes
func FormatVariants ¶
func SkipHeader ¶
SkipHeader skips a VCF header. This is more efficient than calling ParseHeader and ignoring its result.
Types ¶
type FieldParser ¶
type FieldParser func(*StringScanner) interface{}
FieldParser is an abstraction for parsing VCF fields
func CreateFormatParser ¶
func CreateFormatParser(format *FormatInformation) FieldParser
CreateFormatParser creates a specific VCF format section parser for the given format information
func CreateInfoParser ¶
func CreateInfoParser(format *FormatInformation) FieldParser
CreateInfoParser creates a specific VCF info section parser for the given format information
type FormatInformation ¶
type FormatInformation struct { ID utils.Symbol Description string // "" if not present Number int32 // > InvalidNumber Type Type Fields utils.StringMap }
FormatInformation in VCF files.
func NewFormatInformation ¶
func NewFormatInformation() *FormatInformation
NewFormatInformation creates an empty instance.
type Genotype ¶
type Genotype struct { Phased bool GT []int32 // < 0 for unknown entries Data utils.SmallMap // values are nil (for missing entry), int, float64, rune, string, or []interface{} }
Genotype is a structured representation of the GT entry in a VCF file.
type Header ¶
type Header struct { FileFormat string Infos []*FormatInformation Formats []*FormatInformation Meta map[string][]interface{} // string or *MetaInformation Columns []string }
Header section of a VCF files.
func ParseHeader ¶
ParseHeader parses a VCF header
func (*Header) NewVariantParser ¶
func (header *Header) NewVariantParser() *VariantParser
NewVariantParser creates a VariantParser for the given VCF header.
func (*Header) ParseVariants ¶
ParseVariants parses VCF variant lines based on the given VCF header.
type InputFile ¶
InputFile represents a VCF or BCF file for input.
func Open ¶
Open a VCF file for input.
Whether the format is gzipped or not is determined from the content of the input, not from any file extensions.
If the name is "/dev/stdin", then the input is read from os.Stdin
func OpenIfExists ¶
Open a VCF file for input, returning false if it doesn't exist.
Whether the format is gzipped or not is determined from the content of the input, not from any file extensions.
If the name is "/dev/stdin", then the input is read from os.Stdin
type MetaInformation ¶
type MetaInformation struct { ID utils.Symbol Description string // "" if not present Fields utils.StringMap }
MetaInformation in VCF files.
func NewMetaInformation ¶
func NewMetaInformation() *MetaInformation
NewMetaInformation creates an empty instance.
type OutputFile ¶
OutputFile represents a VCF or BCF file for output.
func Create ¶
func Create(name string, format string, level int) *OutputFile
Create a VCF file for output.
The format string can be "vcf" or "gz". If the format string is empty, the output format is determined by looking at the filename extension. If the filename extension is not .gz, then .vcf is always assumed.
The format string will not become part of the resulting filename.
Following zlib, levels range from 1 (BestSpeed) to 9 (BestCompression); higher levels typically run slower but compress more. Level 0 (NoCompression) does not attempt any compression; it only adds the necessary DEFLATE framing. Level -1 (DefaultCompression) uses the default compression level. Level -2 (HuffmanOnly) will use Huffman compression only, giving a very fast compression for all types of input, but sacrificing considerable compression efficiency.
If the name is "/dev/stdout", then the output is written to os.Stdout.
func (*OutputFile) Format ¶
func (output *OutputFile) Format(vcf *Vcf)
Format outputs a full VCF struct.
type StringScanner ¶
type StringScanner struct {
// contains filtered or unexported fields
}
A StringScanner can be used scan/parse strings representing lines in VCF files.
The zero StringScanner is valid and empty.
func (*StringScanner) Len ¶
func (sc *StringScanner) Len() int
Len returns the number of ASCII characters that still need to be scanned/parsed.
func (*StringScanner) ParseFormatCharacter ¶
func (sc *StringScanner) ParseFormatCharacter() interface{}
ParseFormatCharacter parses a rune in a VCF format section
func (*StringScanner) ParseFormatFloat ¶
func (sc *StringScanner) ParseFormatFloat() interface{}
ParseFormatFloat parses a floating point number in a VCF format section
func (*StringScanner) ParseFormatInformation ¶
func (sc *StringScanner) ParseFormatInformation() *FormatInformation
ParseFormatInformation parses VCF format information
func (*StringScanner) ParseFormatInteger ¶
func (sc *StringScanner) ParseFormatInteger() interface{}
ParseFormatInteger parses an integer in a VCF format section
func (*StringScanner) ParseFormatString ¶
func (sc *StringScanner) ParseFormatString() interface{}
ParseFormatString parses a string in a VCF format section
func (*StringScanner) ParseGenericInfo ¶
func (sc *StringScanner) ParseGenericInfo() interface{}
ParseGenericInfo parses a VCF info section without specific format information
func (*StringScanner) ParseInfoCharacter ¶
func (sc *StringScanner) ParseInfoCharacter() interface{}
ParseInfoCharacter parses a rune in a VCF info section
func (*StringScanner) ParseInfoFlag ¶
func (sc *StringScanner) ParseInfoFlag() interface{}
ParseInfoFlag parses a boolean flag in a VCF info section (always returns true)
func (*StringScanner) ParseInfoFloat ¶
func (sc *StringScanner) ParseInfoFloat() interface{}
ParseInfoFloat parses a floating point number in a VCF info section
func (*StringScanner) ParseInfoInteger ¶
func (sc *StringScanner) ParseInfoInteger() interface{}
ParseInfoInteger parses an integer in a VCF info section
func (*StringScanner) ParseInfoString ¶
func (sc *StringScanner) ParseInfoString() interface{}
ParseInfoString parses a string in a VCF info section
func (*StringScanner) ParseMetaField ¶
func (sc *StringScanner) ParseMetaField() (key, value string)
ParseMetaField parses a VCF meta field
func (*StringScanner) ParseMetaInformation ¶
func (sc *StringScanner) ParseMetaInformation() interface{}
ParseMetaInformation parses VCF meta information
func (*StringScanner) ParseVariant ¶
func (sc *StringScanner) ParseVariant(vp *VariantParser) Variant
ParseVariant parses a VCF variant line
func (*StringScanner) Reset ¶
func (sc *StringScanner) Reset(s string)
Reset resets the scanner, and initializes it with the given string.
type Type ¶
type Type uint
Type is an enumeration type for different VCF field types
const ( InvalidType Type = iota Integer // represented as int (not int32, since that's the same as rune in Go) Float // represented as float64 (parsing as float32 seems problematic in some cases in Go) Flag // represented as bool with fixed value true Character // represented as rune String // represented as string )
The different VCF field types
type Variant ¶
type Variant struct { Source string // this is not part of the VCF spec, but is needed in HaplotypeCaller Chrom string Pos int32 // < 0 if unknown ID []string // nil/empty if missing Ref string Alt []string // nil/empty if missing Qual interface{} // float64, or nil if missing Filter []utils.Symbol // nil/empty if missing Info utils.SmallMap // values are int, float64, bool, rune, string, or []interface{} GenotypeFormat []utils.Symbol GenotypeData []Genotype }
Variant line in a VCF file.
func (*Variant) End ¶
End returns the end position of a VCF line in the reference, determined either by the END field or len(v.Ref)
type VariantParser ¶
VariantParser is an optimized parser for VCF variant lines.
NSamples can be decreased as necessary to parse fewer samples, including down to zero.