DownsamplingΒΆ

Data Types: VCF, SAM, BAM, FASTQ, GVF, 23andMe

Usage

genocat --downsample <rate>[,<shard>]

Description

Shows one line (or read in the case of FASTQ) in every rate lines. The optional shard parameter (0-based) determines which of the rate lines is shown. The default value of shard is 0.

Example:

Getting the middle read of every 3 consecutive FASTQ reads (i.e. read 1 of every {0,1,2}):

$ genocat my-file.fq.genozip

@A00910:85:HYGWJDSXX:1:1101:3025:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTGGGGGTGGGGATCCCTATCTTAGCTGTTGCAATCCCTGGGCTGCTTCAGTGTTAATAACATTCCAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:9028:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTACAATGTGTGACAACAATAATGTAAAAGGTAGATGAAATTAAAGTACCTAGCAATATTAGGAAATTG
+
#FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFF:F:,FF
@A00910:85:HYGWJDSXX:1:1101:15067:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTGTAGCATGCTCTTTGGTGCAAATTGACGAGCAGATTCTAAAAGTCACAGAGAAATGCAAAAGACCCTG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:
@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:16984:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCTATTTTGCCCCTGAGGGTGCATCCCGAAGAGGGAAGCTATTGATTTTTAACACTAGACACATAAAC
+
#:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:20636:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTATATACCTATTTTCATATTTTTGTCAGTGTTGGTCAGATTTTTAGAAGTGAGATTTGCTAGCAAAAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:1714:1016 1:N:0:CAACGAGAGC+GAATTGAGTG
NATATTGGTCTTATGATCATAAATTTTCTCAGCATTTATATTCTGAAGAATATATATTTCCTGTTTATTT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFF

$ genocat my-file.fq.genozip --downsample 3,1

@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF