gvcf

package
v0.0.0-...-7a0a068 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: AGPL-3.0, AGPL-3.0 Imports: 8 Imported by: 1

Documentation

Index

Constants

This section is empty.

Variables

View Source
var VERSION string = "0.1.0"

Functions

This section is empty.

Types

type GVCFRefVar

type GVCFRefVar struct {
	Type        int
	MessageType int
	RefSeqFlag  bool
	NocSeqFlag  bool
	Out         io.Writer
	Msg         pasta.ControlMessage
	RefBP       byte
	Allele      int

	ChromStr string

	// 0 reference
	//
	RefPos int

	OCounter int
	LFMod    int

	PrintHeader bool
	Reference   string
	DataSource  string

	PrevRefBPStart byte
	PrevRefBPEnd   byte

	VCFVer string

	Date time.Time

	Id     string
	Qual   string
	Filter string
	Info   string
	Format string

	PrevStartRefBase byte
	PrevEndRefBase   byte
	PrevStartRefPos  int
	PrevRefLen       int
	PrevVarType      int

	FirstFlag bool

	State int

	StateHistory []GVCFRefVarInfo

	StreamRefPos int

	IgnoreBacktrackingStart bool
	ShowWarning             bool
}

func (*GVCFRefVar) Chrom

func (g *GVCFRefVar) Chrom(chr string)

func (*GVCFRefVar) EmitLine

func (g *GVCFRefVar) EmitLine(vartype int, vcf_ref_pos, vcf_ref_len int, vcf_ref_base byte, alt_field string, sample_field string, out *bufio.Writer) error

0 1 2 3 4 5 6 7 8 9 chrom pos id ref alt qual filter info format sample

func (*GVCFRefVar) GetRefPos

func (g *GVCFRefVar) GetRefPos() int

func (*GVCFRefVar) Header

func (g *GVCFRefVar) Header(out *bufio.Writer) error

func (*GVCFRefVar) Init

func (g *GVCFRefVar) Init()

func (*GVCFRefVar) Pasta

func (g *GVCFRefVar) Pasta(gvcf_line string, ref_stream *bufio.Reader, out *bufio.Writer) error

Read in gVCF, one line at a time, and create a PASTA stream from it.

func (*GVCFRefVar) PastaBegin

func (g *GVCFRefVar) PastaBegin(out *bufio.Writer) error

func (*GVCFRefVar) PastaEnd

func (g *GVCFRefVar) PastaEnd(out *bufio.Writer) error

func (*GVCFRefVar) PastaNocallRef

func (g *GVCFRefVar) PastaNocallRef(ref_stream *bufio.Reader, out *bufio.Writer) error

func (*GVCFRefVar) Pos

func (g *GVCFRefVar) Pos(pos int)

func (*GVCFRefVar) Print

func (g *GVCFRefVar) Print(vartype int, ref_start, ref_len int, refseq []byte, altseq [][]byte, out *bufio.Writer) error

(g)VCF lines consist of:

0 1 2 3 4 5 6 7 8 9 chrom pos id ref alt qual filter info format sample

Print receives interpreted 'difference stream' lines, one at a time.

We make a simplifying assumption that if there is a nocall region right next to an alt call, the alt call gets subsumed into the nocall region.

We print the nocall region with full sequence so that it's recoverable but otherwise it looks like a nocall region.

This function got unfortunately quite complicated. The 'difference stream' that gets fed into this function is reporting alternates, nocalls, indels, etc. with only the minimal amount of information, not reporting the reference base before or after it. This means we have to keep state in order to report the reference anchor base as the VCF specification requires. In many cases, we have no choice but to violate the VCF spec because we don't have information about the reference base that came before the current alternate in question. Under this condition, we report the right anchor reference base and put a field in the INFO column to indcate that we've done so.

The general scheme is to save state in a structure called `StateHistory`. We consider all pair transitions ( {ALT,NOC,REF} -> {ALT,NOC,REF} ) to determine whether we can emit a line and what to emit if we can.

The easy (and hopefully common) case is when there is a REF line followed by a non-REF line. In that case, we can emit the REF line before it, peeling off the last reference base to use as an anchor for the ALT line if need be (for example, when the ALT line is a straight deletion). Problems arise if there are ALT lines without any REF lines in between them, which shouldn't happen if the 'difference stream' is working properly, or, more likely, if the difference stream begins on an ALT or NOC. This special case of the beginning stream causes most of the complexity below.

For example, if the first difference line is a deletion, the anchor reference base needs to be taken from the following REF line. If the next REF line has more than one reference base, then we can peel it off the beginning, use it as a right anchor in the current reported ALT line and promote the REF line to be the base element in the `StateHistory` structure. If the next REF line only has one reference base, then we could end up taking a reference base that could be used in subsequent ALT or NOC lines further on down the stream. In order to reduce this cascading domino effect, the ALT line is extended in this case.

Since this function is to be used with streams that don't need to start at reference position 1, reporting an achor reference base that isn't to the left of the ALT line is straight away violating the VCF sepcification. There's not much we can do since we want to report gVCF for arbitrary sequences. With the VCF specification as stated, it's impossible to report arbitrary sequence information with rigid fixed endpoints. Instead we make due by occasionally reporting a right anchor point and giving a field in the INFO column of `REF_ANCHOR_AT_END=TRUE`.

func (*GVCFRefVar) PrintEnd

func (g *GVCFRefVar) PrintEnd(out *bufio.Writer) error

type GVCFRefVarInfo

type GVCFRefVarInfo struct {
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL