ATACdemultiplex

command
v0.47.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 9, 2020 License: MIT Imports: 14 Imported by: 0

README

Demultiplexer for scATAC-Seq

This package aims to insert snATAC-Seq index tags inside read ID. The barcoding and tagging strategy follows the protocol described in (Preissl et. al, 2018, doi:10.1038/s41593-018-0079-3). In Preissl et. al., Each barcode consists of four 8-bp long indexes (i5, i7, p5 and p7). The first 8 bp of Index1 correspond to the p7 barcode and the last 8 bp to the i7 barcode. The first 8 bp of Index2 correspond to the i5 barcode and the last 8 bp to the p5 barcode.

Usage

Usage of ATACdemultiplex:
  -compressionMode int
        compressionMode for native bzip2 lib
         (1 faster -> 9 smaller) <default: 6> (default 6)
  -fastq_I1 string
        fastq index file index paired read 1
  -fastq_I2 string
        fastq index file index paired read 2
  -fastq_R1 string
        fastq read file index paired read 1
  -fastq_R2 string
        fastq read file index paired read 2
  -guess_nb_lines
        guess automatically position of the lines (for mulithread). May be not safe in some situation
  -index_no_replicate string
        <OPTIONAL> path toward indexes when only 1 replicate is used
  -index_replicate_r1 string
        <OPTIONAL> path toward indexes of R1 replicates (i.e. replicate number 1)
  -index_replicate_r2 string
        <OPTIONAL> path toward indexes of R2 replicates (i.e. replicate number 2)
  -max_nb_mistake int
        Maximum number of mistakes allowed to assign a reference read id (default 2) (default 2)
  -max_nb_reads int
        <OPTIONAL> max number of reads to process (default 0 => None)
  -nbThreads int
        number of threads to use (default 1)
  -output_tag_name string
        tag for the output file names (default None)
  -taglength int
        <OPTIONAL> number of nucleotides to consider at the end
         and begining (default 8) (default 8)
  -use_bzip2_go_lib
        use bzip2 go library instead of native C lib (slower)
  -write_extensive_logs
        write extensive logs (can consume extra RAM memory and slower the process)
  -write_logs
        write logs (might slower the execution time)

multithreads

The speed is not linear to the number of CPUs used, because each thread needs to reach the correct starting line, which can takes time for very large files. (i.e. a 4.6 Gig fastq file cann contain up to 10**10 lines)

Warning

  • The input files should be compressed using the bzip2 protocol!
  • The 4 input files should be paired: i.e. the reads are ordered similarly and match between the files

bzip2 decompression and encoding

  • By default, this software uses the C header "bzlib.h" which provides the fastest implementation of bzip2 library. However, it is also possible to use a go bzip2 encoding/library, with the option -use_bzip2_go_lib which replaces the bzlib.h library, but is significantly slower (1.2 to 1.5 x slower).

I5, I7, P5, P7 indexes

The different indexes can provided as input with -index_replicate_r1 and -index_replicate_r2, or -index_no_replicate. In the default protocol, The first 8 bp of Index1 correspond to the p7 barcode and the last 8 bp to the i7 barcode. The first 8 bp of Index2 correspond to the i5 barcode and the last 8 bp to the p5 barcode. the format of the input files should be similar to:

i7    GTCATGAA
i7    CAAGTTCA
i7    TATGCCAT
i7    GAATTACG
i5    GGATACTA
i5    TAAGATCC
...
<indexID>\t<index>

Replicate demultiplexing

The options -index_replicate_r1 and -index_replicate_r2 allows to demultiplex the reads according to 2 replicates

Example

wget http://enhancer.sdsc.edu/spreissl/Test/SP176_177_P56_I1.fastq.bz2
wget http://enhancer.sdsc.edu/spreissl/Test/SP176_177_P56_I2.fastq.bz2
wget http://enhancer.sdsc.edu/spreissl/Test/SP176_177_P56_R1.fastq.bz2
wget http://enhancer.sdsc.edu/spreissl/Test/SP176_177_P56_R1.fastq.bz2

 ATACdemultiplex -fastq_I1 SP176_177_P56_I1.fastq.bz2 -fastq_I2 SP176_177_P56_I2.fastq.bz2 -fastq_R1 SP176_177_P56_R1.fastq.bz2 -fastq_R2 SP176_177_P56_R2.fastq.bz2  --max_nb_reads 100000

Performance

Depending of the options chosen, the speed is between 1.5 Meg/s (for bzip fastq files) to 2.0 Meg/s. We prodive a multi-threading option (-nbThreads) which greatly improve the speed of the computation. However, the multi-threading does not reduce the speed linearly (each thread needs to reach their respective starting lines, which takes some time!), but rather sublinerarly.

Contact

Olivier Poirion (PhD) * oporion@ucsd.edu

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL