Documentation ¶
Overview ¶
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2020 Grail Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Index ¶
- Constants
- Variables
- func ChrId(chr string) (int, error)
- func ConvertPileupRowsToBasestrandRio(ctx context.Context, tmpFiles []*os.File, mainPath string, refNames []string) (err error)
- func ConvertPileupRowsToBasestrandTSV(ctx context.Context, tmpFiles []*os.File, mainPath string, colBitset int, ...) (err error)
- func ConvertPileupRowsToTSV(ctx context.Context, tmpFiles []*os.File, mainPath string, colBitset int, ...) (err error)
- func MarshalPileupRow(scratch []byte, p interface{}) ([]byte, error)
- func Pileup(ctx context.Context, xampath, fapath, format, outPrefix string, rawOpts *Opts, ...) (err error)
- func ReadBaseStrandTsvIntoChannel(reader *tsv.Reader, c chan []BaseStrandTsvRow, bufferLen int, fileName string, ...)
- func WriteBaseStrandToTSV(piles []BaseStrandPile, refNames []string, w io.Writer) (err error)
- func WriteBaseStrandTsv(rows []BaseStrandTsvRow, writer io.Writer) error
- func WriteBaseStrandsRio(piles []BaseStrandPile, refNames []string, out io.Writer) error
- func WriteBaseStrandsRioAsTSV(ctx context.Context, path string, w io.Writer) error
- type BaseStrandPile
- type BaseStrandTsvRow
- type BaseStrandUnmarshaller
- type Opts
- type PerReadFeatures
- type PileupPayload
- type PileupRow
- type PosType
Constants ¶
const ( FieldCounts = 1 << iota FieldPerReadA FieldPerReadC FieldPerReadG FieldPerReadT FieldPerReadAny = FieldPerReadA | FieldPerReadC | FieldPerReadG | FieldPerReadT )
const FaEncoding = fasta.Seq8
FaEncoding is the fasta in-memory encoding expected by snp.Pileup(). (Seq8 is actually worse than both ASCII and Base5 for this SNP-pileup, but it simplifies future extension to indels.)
const PosTypeMax = pileup.PosTypeMax
PosTypeMax is the maximum value that can be represented by a PosType.
Variables ¶
var DefaultOpts = Opts{ Clip: 0, FlagExclude: 0xf00, Mapq: 60, MaxReadLen: 500, MaxReadSpan: 511, MinBagDepth: 0, MinBaseQual: 0, Parallelism: 0, PerStrand: false, RemoveSq: false, Stitch: false, }
Functions ¶
func ConvertPileupRowsToTSV ¶
func MarshalPileupRow ¶
Serialized format:
[0..4): fieldsPresent [4..8): refID [8..12): pos [12..16): depth if counts present, stored in next 40 bytes if perRead[pileup.baseA] present, length stored in next 4 bytes, then values stored in next 6*n bytes if perRead[pileup.baseC] present... etc.
This is essentially the simplest format that can support the variable-length per-read feature arrays that are needed. It is not difficult to decrease the nominal size of these records by (i) using varints instead of uint32s, and (ii) making fieldsPresent indicate which counts[][] values are nonzero and only storing those; but I wouldn't expect that to be worth the additional complexity since all uses of this marshal function are bundled with the "zstd 1" transformer anyway. (Instead, all the 'extra' complexity in this function concerns (i) avoiding extra allocations and (ii) avoiding a ridiculous number of spurious bounds-checks, in ways that make sense for a wide variety of other serialization functions.)
In the future, we may need to add indel support.
func ReadBaseStrandTsvIntoChannel ¶
func ReadBaseStrandTsvIntoChannel(reader *tsv.Reader, c chan []BaseStrandTsvRow, bufferLen int, fileName string, wg *sync.WaitGroup)
ReadBaseStrandTsvIntoChannel reads a basestrand.tsv file from the given tsv.Reader into the given channel.
func WriteBaseStrandToTSV ¶
func WriteBaseStrandToTSV(piles []BaseStrandPile, refNames []string, w io.Writer) (err error)
WriteBaseStrandToTSV writes a []BaseStrandPile as a TSV.
func WriteBaseStrandTsv ¶
func WriteBaseStrandTsv(rows []BaseStrandTsvRow, writer io.Writer) error
WriteBaseStrandTsv writes a basestrand.tsv file to the given writer
func WriteBaseStrandsRio ¶
func WriteBaseStrandsRio(piles []BaseStrandPile, refNames []string, out io.Writer) error
WriteBaseStrandsRio writes the given BaseStrand-pileup entries to the given writer, using recordio.
Types ¶
type BaseStrandPile ¶
BaseStrandPile represents a single pileup entry with a count for every (base, strand) tuple.
- Pos is zero-based; it is necessary to add 1 when converting to most text formats (but not BED).
- In Counts[][], base is the major dimension, with pileup.BaseA=0, C=1, G=2, T=3. Strand is the minor dimension, with strandFwd=0 and strandRev=1. TODO(cchang): strandFwd=0, strandRev=1 is inconsistent with bio/pileup's internal representation (which has None=0). We have enough other code at this point with Fwd=0, Rev=1 that it's probably time to change bio/pileup's representation to match that.
func ReadBaseStrandsRio ¶
func ReadBaseStrandsRio(rs io.ReadSeeker) (piles []BaseStrandPile, refNames []string, err error)
ReadBaseStrandsRio reads BaseStrand piles from a recordio file written by WriteBaseStrandsRio.
type BaseStrandTsvRow ¶
type BaseStrandTsvRow struct { Chr string `tsv:"#CHROM"` // Chromosome Pos int64 `tsv:"POS"` // Position in chromosome Ref string `tsv:"REF"` // Reference base FwdA int64 `tsv:"A+"` // A count on the forward strand RevA int64 `tsv:"A-"` // A count on the reverse strand FwdC int64 `tsv:"C+"` // C count on the forward strand RevC int64 `tsv:"C-"` // C count on the reverse strand FwdG int64 `tsv:"G+"` // G count on the forward strand RevG int64 `tsv:"G-"` // G count on the reverse strand FwdT int64 `tsv:"T+"` // T count on the forward strand RevT int64 `tsv:"T-"` // T count on the reverse strand }
BaseStrandTsvRow represents a single row of a basestrand.tsv file.
func ReadBaseStrandTsv ¶
func ReadBaseStrandTsv(r io.Reader) ([]BaseStrandTsvRow, error)
ReadBaseStrandTsv reads a basestrand.tsv file from the given io.Reader.
func ReadSingleStrandBaseStrandTsv ¶
func ReadSingleStrandBaseStrandTsv(forward, reverse io.Reader) ([]BaseStrandTsvRow, error)
ReadSingleStrandBaseStrandTsv reads strand specific strand.<fwd/rev>.snp.tsv files from the given io.Reader.
type BaseStrandUnmarshaller ¶
type BaseStrandUnmarshaller struct {
// contains filtered or unexported fields
}
BaseStrandUnmarshaller is used to allocate memory in large blocks during unmarshalling, to prevent contention with other goroutines.
func (*BaseStrandUnmarshaller) UnmarshalBaseStrand ¶
func (b *BaseStrandUnmarshaller) UnmarshalBaseStrand(in []byte) (out interface{}, err error)
type PerReadFeatures ¶
type PileupPayload ¶
type PileupPayload struct { Depth uint32 Counts [pileup.NBaseEnum][2]uint32 PerRead [pileup.NBase][]PerReadFeatures }
PileupPayload is a container for all types of pileup data which may be associated with a single position. It does not store the position itself, or a tag indicating which parts of the container are used.
Depth and count values are of type uint32 instead of int to reduce cache footprint.
type PileupRow ¶
type PileupRow struct { FieldsPresent uint32 // field... flags RefID uint32 Pos uint32 Payload PileupPayload }
PileupRow contains all pileup data associated with a single position, along with the position itself and the set of PileupPayload fields used.
The main loop splits the genome into shards, and generates lightly compressed (zstd level 1) per-shard PileupRow recordio files. Then, the per-shard files are read in sequence and converted to the final requested output format. This is a bit inefficient, but we can easily afford it.