vcf

package
v4.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2020 License: AGPL-3.0, AGPL-3.0-or-later Imports: 12 Imported by: 0

Documentation

Overview

Package vcf is a library for for parsing and representing VCF files. See http://samtools.github.io/hts-specs/VCFv4.3.pdf

Index

Constants

View Source
const (
	VcfExt = ".vcf"
	BcfExt = ".bcf"
	GzExt  = ".gz"
)

The possible file extensions for VCF or BCF files, or gz-compressed VCF files

View Source
const (
	FileFormatVersion     = "VCFv4.3"
	FileFormatVersionLine = "##fileformat=VCFv4.3"
)

The supported VCF file format version.

View Source
const (
	NumberA int32 = -1 * (1 + iota)
	NumberR
	NumberG
	NumberDot
	InvalidNumber
)

Constants for format information Number entries.

Variables

View Source
var (
	END  = utils.Intern("END")
	GT   = utils.Intern("GT")
	PASS = utils.Intern("PASS")
)

Commonly used VCF entries.

View Source
var DefaultHeaderColumns = []string{"CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO"}

DefaultHeaderColumns for VCF files.

Functions

func FormatFormatInformation

func FormatFormatInformation(out *bufio.Writer, format *FormatInformation, infoNotFormat bool) error

FormatFormatInformation outputs VCF info or format information

func FormatMetaInformation

func FormatMetaInformation(out *bufio.Writer, meta interface{}) error

FormatMetaInformation outputs VCF meta information, which can be just a string or *MetaInformation

func FormatString

func FormatString(out io.ByteWriter, str string) error

FormatString outputs a string to a VCF file, adding necessary double quotes and escapes

Types

type FieldParser

type FieldParser func(*StringScanner) interface{}

FieldParser is an abstraction for parsing VCF fields

func CreateFormatParser

func CreateFormatParser(format *FormatInformation) (FieldParser, error)

CreateFormatParser creates a specific VCF format section parser for the given format information

func CreateInfoParser

func CreateInfoParser(format *FormatInformation) (FieldParser, error)

CreateInfoParser creates a specific VCF info section parser for the given format information

type FormatInformation

type FormatInformation struct {
	ID          utils.Symbol
	Description string // "" if not present
	Number      int32  // > InvalidNumber
	Type        Type
	Fields      utils.StringMap
}

FormatInformation in VCF files.

func NewFormatInformation

func NewFormatInformation() *FormatInformation

NewFormatInformation creates an empty instance.

type Header struct {
	FileFormat string
	Infos      []*FormatInformation
	Formats    []*FormatInformation
	Meta       map[string][]interface{} // string or *MetaInformation
	Columns    []string
}

Header section of a VCF files.

func NewHeader

func NewHeader() *Header

NewHeader creates an empty instance.

func ParseHeader

func ParseHeader(reader *bufio.Reader) (hdr *Header, lines int, err error)

ParseHeader parses a VCF header

func (*Header) Format

func (header *Header) Format(out *bufio.Writer) (err error)

Format outputs a VCF header

func (*Header) NewVariantParser

func (header *Header) NewVariantParser() (*VariantParser, error)

NewVariantParser creates a VariantParser for the given VCF header.

type InputFile

type InputFile struct {
	*bufio.Reader
	*exec.Cmd
	// contains filtered or unexported fields
}

InputFile represents a VCF or BCF file for input.

func Open

func Open(name string, headerOnly bool) (*InputFile, error)

Open a VCF file for input.

If the filename extension is .bcf or .gz, use bcftools view for input. Tell bcftools view to only return the header section for input when headerOnly is true.

bcftools must be visible in the directories named by the PATH environment variable for .bcf or .gz input.

If the filename extension is not .bcf or .gz, then .vcf is always assumed.

If the name is "/dev/stdin", then the input is read from os.Stdin

func (*InputFile) Close

func (input *InputFile) Close() error

Close the VCF input file. If bcftools view is used for input, wait for its process to finish.

func (*InputFile) VcfReader

func (input *InputFile) VcfReader() *Reader

VcfReader returns the reader for a VCF or BCF InputFile.

type MetaInformation

type MetaInformation struct {
	ID          utils.Symbol
	Description string // "" if not present
	Fields      utils.StringMap
}

MetaInformation in VCF files.

func NewMetaInformation

func NewMetaInformation() *MetaInformation

NewMetaInformation creates an empty instance.

type OutputFile

type OutputFile struct {
	*bufio.Writer
	*exec.Cmd
	// contains filtered or unexported fields
}

OutputFile represents a VCF or BCF file for output.

func Create

func Create(name string, compressed bool) (*OutputFile, error)

Create a VCF file for output.

If the filename extension is .bcf or .gz, use bcftools view for output.

bcftools must be visible in the directories named by the PATH environment variable for .bcf or .gz output.

If the filename extension is not .bcf or .gz, then .vcf is always assumed.

If the name is "/dev/stdout", then the output is written to os.Stdout.

func (*OutputFile) Close

func (output *OutputFile) Close() error

Close the VCF input file. If bcftools view is used for input, wait for its process to finish.

func (*OutputFile) VcfWriter

func (output *OutputFile) VcfWriter() *Writer

VcfWriter returns the Writer for a VCF or BCF OutputFile.

type Reader

type Reader bufio.Reader

Reader is a bufio.Reader for a VCF or BCF InputFile.

type StringScanner

type StringScanner struct {
	// contains filtered or unexported fields
}

A StringScanner can be used scan/parse strings representing lines in VCF files.

The zero StringScanner is valid and empty.

func (*StringScanner) Err

func (sc *StringScanner) Err() error

Err returns the error that occurred during scanning/parsing.

func (*StringScanner) Len

func (sc *StringScanner) Len() int

Len returns the number of ASCII characters that still need to be scanned/parsed. Returns 0 if Err() would return a non-nil value.

func (*StringScanner) ParseFormatCharacter

func (sc *StringScanner) ParseFormatCharacter() interface{}

ParseFormatCharacter parses a rune in a VCF format section

func (*StringScanner) ParseFormatFloat

func (sc *StringScanner) ParseFormatFloat() interface{}

ParseFormatFloat parses a floating point number in a VCF format section

func (*StringScanner) ParseFormatInformation

func (sc *StringScanner) ParseFormatInformation() *FormatInformation

ParseFormatInformation parses VCF format information

func (*StringScanner) ParseFormatInteger

func (sc *StringScanner) ParseFormatInteger() interface{}

ParseFormatInteger parses an integer in a VCF format section

func (*StringScanner) ParseFormatString

func (sc *StringScanner) ParseFormatString() interface{}

ParseFormatString parses a string in a VCF format section

func (*StringScanner) ParseGenericFormat

func (sc *StringScanner) ParseGenericFormat() interface{}

ParseGenericFormat parses a VCF format section without specific format information

func (*StringScanner) ParseGenericInfo

func (sc *StringScanner) ParseGenericInfo() interface{}

ParseGenericInfo parses a VCF info section without specific format information

func (*StringScanner) ParseInfoCharacter

func (sc *StringScanner) ParseInfoCharacter() interface{}

ParseInfoCharacter parses a rune in a VCF info section

func (*StringScanner) ParseInfoFlag

func (sc *StringScanner) ParseInfoFlag() interface{}

ParseInfoFlag parses a boolean flag in a VCF info section (always returns true)

func (*StringScanner) ParseInfoFloat

func (sc *StringScanner) ParseInfoFloat() interface{}

ParseInfoFloat parses a floating point number in a VCF info section

func (*StringScanner) ParseInfoInteger

func (sc *StringScanner) ParseInfoInteger() interface{}

ParseInfoInteger parses an integer in a VCF info section

func (*StringScanner) ParseInfoString

func (sc *StringScanner) ParseInfoString() interface{}

ParseInfoString parses a string in a VCF info section

func (*StringScanner) ParseMetaField

func (sc *StringScanner) ParseMetaField() (key, value string)

ParseMetaField parses a VCF meta field

func (*StringScanner) ParseMetaInformation

func (sc *StringScanner) ParseMetaInformation() interface{}

ParseMetaInformation parses VCF meta information

func (*StringScanner) ParseVariant

func (sc *StringScanner) ParseVariant(vp *VariantParser) *Variant

ParseVariant parses a VCF variant line

func (*StringScanner) Reset

func (sc *StringScanner) Reset(s string)

Reset resets the scanner, and initializes it with the given string.

func (*StringScanner) SkipSpace

func (sc *StringScanner) SkipSpace()

SkipSpace skips ' ' runes

type Type

type Type uint

Type is an enumeration type for different VCF field types

const (
	InvalidType Type = iota
	Integer          // represented as int (not int32, since that's the same as rune in Go)
	Float            // represented as float64 (parsing as float32 seems problematic in some cases in Go)
	Flag             // represented as bool with fixed value true
	Character        // represented as rune
	String           // represented as string
)

The different VCF field types

type Variant

type Variant struct {
	Chrom          string
	Pos            int32
	ID             []string // nil/empty if missing
	Ref            string
	Alt            []string       // nil/empty if missing
	Qual           interface{}    // float64, or nil if missing
	Filter         []utils.Symbol // nil/empty if missing
	Info           utils.SmallMap // values are int, float64, bool, rune, string, or []interface{}
	GenotypeFormat []utils.Symbol
	GenotypeData   []utils.SmallMap // values are nil (for missing entry), int, float64, rune, string, or []interface{}
}

Variant line in a VCF file.

func (*Variant) End

func (v *Variant) End() (int32, error)

End returns the end position of a VCF line in the reference, determined either by the END field or len(v.Ref)

func (*Variant) Format

func (variant *Variant) Format(out []byte) ([]byte, error)

Format outputs a VCF variant line

func (*Variant) Pass

func (v *Variant) Pass() bool

Pass determines whether the variant passed all filters.

func (*Variant) Start

func (v *Variant) Start() int32

Start returns the start position of a VCF line in the reference.

type VariantParser

type VariantParser struct {
	InfoParsers, FormatParsers utils.SmallMap
	NSamples                   int
}

VariantParser is an optimized parser for VCF variant lines.

NSamples can be decreased as necessary to parse fewer samples, including down to zero.

type Vcf

type Vcf struct {
	Header   *Header
	Variants []*Variant
}

Vcf represents the full contents of a VCF file.

func (*Vcf) Format

func (vcf *Vcf) Format(out *bufio.Writer) error

Format outputs a full VCF struct

type Writer

type Writer bufio.Writer

Writer is a bufio.Writer for a VCF or BCF OutputFile.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL