Documentation ¶
Index ¶
- Variables
- type GVCFRefVar
- func (g *GVCFRefVar) Chrom(chr string)
- func (g *GVCFRefVar) EmitLine(vartype int, vcf_ref_pos, vcf_ref_len int, vcf_ref_base byte, alt_field string, ...) error
- func (g *GVCFRefVar) GetRefPos() int
- func (g *GVCFRefVar) Header(out *bufio.Writer) error
- func (g *GVCFRefVar) Init()
- func (g *GVCFRefVar) Pasta(gvcf_line string, ref_stream *bufio.Reader, out *bufio.Writer) error
- func (g *GVCFRefVar) PastaBegin(out *bufio.Writer) error
- func (g *GVCFRefVar) PastaEnd(out *bufio.Writer) error
- func (g *GVCFRefVar) PastaNocallRef(ref_stream *bufio.Reader, out *bufio.Writer) error
- func (g *GVCFRefVar) Pos(pos int)
- func (g *GVCFRefVar) Print(vartype int, ref_start, ref_len int, refseq []byte, altseq [][]byte, ...) error
- func (g *GVCFRefVar) PrintEnd(out *bufio.Writer) error
- type GVCFRefVarInfo
Constants ¶
This section is empty.
Variables ¶
var VERSION string = "0.1.0"
Functions ¶
This section is empty.
Types ¶
type GVCFRefVar ¶
type GVCFRefVar struct { Type int MessageType int RefSeqFlag bool NocSeqFlag bool Out io.Writer Msg pasta.ControlMessage RefBP byte Allele int ChromStr string // 0 reference // RefPos int OCounter int LFMod int PrintHeader bool Reference string DataSource string PrevRefBPStart byte PrevRefBPEnd byte VCFVer string Date time.Time Id string Qual string Filter string Info string Format string PrevStartRefBase byte PrevEndRefBase byte PrevStartRefPos int PrevRefLen int PrevVarType int FirstFlag bool State int StateHistory []GVCFRefVarInfo StreamRefPos int IgnoreBacktrackingStart bool ShowWarning bool }
func (*GVCFRefVar) Chrom ¶
func (g *GVCFRefVar) Chrom(chr string)
func (*GVCFRefVar) EmitLine ¶
func (g *GVCFRefVar) EmitLine(vartype int, vcf_ref_pos, vcf_ref_len int, vcf_ref_base byte, alt_field string, sample_field string, out *bufio.Writer) error
0 1 2 3 4 5 6 7 8 9 chrom pos id ref alt qual filter info format sample
func (*GVCFRefVar) GetRefPos ¶
func (g *GVCFRefVar) GetRefPos() int
func (*GVCFRefVar) Init ¶
func (g *GVCFRefVar) Init()
func (*GVCFRefVar) PastaBegin ¶
func (g *GVCFRefVar) PastaBegin(out *bufio.Writer) error
func (*GVCFRefVar) PastaNocallRef ¶
func (*GVCFRefVar) Pos ¶
func (g *GVCFRefVar) Pos(pos int)
func (*GVCFRefVar) Print ¶
func (g *GVCFRefVar) Print(vartype int, ref_start, ref_len int, refseq []byte, altseq [][]byte, out *bufio.Writer) error
(g)VCF lines consist of:
0 1 2 3 4 5 6 7 8 9 chrom pos id ref alt qual filter info format sample
Print receives interpreted 'difference stream' lines, one at a time.
We make a simplifying assumption that if there is a nocall region right next to an alt call, the alt call gets subsumed into the nocall region.
We print the nocall region with full sequence so that it's recoverable but otherwise it looks like a nocall region.
This function got unfortunately quite complicated. The 'difference stream' that gets fed into this function is reporting alternates, nocalls, indels, etc. with only the minimal amount of information, not reporting the reference base before or after it. This means we have to keep state in order to report the reference anchor base as the VCF specification requires. In many cases, we have no choice but to violate the VCF spec because we don't have information about the reference base that came before the current alternate in question. Under this condition, we report the right anchor reference base and put a field in the INFO column to indcate that we've done so.
The general scheme is to save state in a structure called `StateHistory`. We consider all pair transitions ( {ALT,NOC,REF} -> {ALT,NOC,REF} ) to determine whether we can emit a line and what to emit if we can.
The easy (and hopefully common) case is when there is a REF line followed by a non-REF line. In that case, we can emit the REF line before it, peeling off the last reference base to use as an anchor for the ALT line if need be (for example, when the ALT line is a straight deletion). Problems arise if there are ALT lines without any REF lines in between them, which shouldn't happen if the 'difference stream' is working properly, or, more likely, if the difference stream begins on an ALT or NOC. This special case of the beginning stream causes most of the complexity below.
For example, if the first difference line is a deletion, the anchor reference base needs to be taken from the following REF line. If the next REF line has more than one reference base, then we can peel it off the beginning, use it as a right anchor in the current reported ALT line and promote the REF line to be the base element in the `StateHistory` structure. If the next REF line only has one reference base, then we could end up taking a reference base that could be used in subsequent ALT or NOC lines further on down the stream. In order to reduce this cascading domino effect, the ALT line is extended in this case.
Since this function is to be used with streams that don't need to start at reference position 1, reporting an achor reference base that isn't to the left of the ALT line is straight away violating the VCF sepcification. There's not much we can do since we want to report gVCF for arbitrary sequences. With the VCF specification as stated, it's impossible to report arbitrary sequence information with rigid fixed endpoints. Instead we make due by occasionally reporting a right anchor point and giving a field in the INFO column of `REF_ANCHOR_AT_END=TRUE`.
type GVCFRefVarInfo ¶
type GVCFRefVarInfo struct {
// contains filtered or unexported fields
}