Compression¶
Genozip can run on any type of file, but it is optimized to compress genomic file formats.
Simple compression and decompression
genozip sample.bam
genounzip sample.bam.genozip
Compressing and decompressing FASTQ with paired-end reads with a reference
First, create a refrence file. This is a one-time step, that may take up to 10 minutes:genozip --make-reference myfasta.fa
Second, compress or decompresses your file(s) using the reference:genozip --reference myfasta.ref.genozip --pair mysample-R1.fastq.gz mysample-R2.fastq.gz
genounzip --reference myfasta.ref.genozip mysample-R1+2.fastq.genozip
Compressing and decompressing multiple files into a tar file (preserving directory structure)
Using genozip in a pipline
genocat mysample.sam.genozip | samtools - .....
my-sam-outputing-method | genozip - --input sam --output mysample.sam.genozip
Viewing compression stats
genocat sample.bam.genozip --stats
Lookups, downsampling and other subsets
genocat --regions chr1:10000-20000 mysamples.vcf.genozip
Displays a specific region.genocat --regions ^Y,MT mysample.bam.genozip
Displays all alignments except Y and MT contigs.genocat --regions chrM GRCh38.fa.genozip
Dislays the sequence of chrM.genocat --samples SMPL1,SMPL2 mysamples.vcf.genozip
Displays 2 samples.genocat --grep 1101:2392 myreads.fq.genozip
Displays reads that have “1101:2392” anywhere in the description.genocat --downsample 10 mysample.fq.genozip
Displays 1 in 10 reads.Note: These are just some examples - there are many more subsetting options see genocat.
Compressing even better, with some minor modifications of the data
genozip file.bam --optimize
Note: compression with--optimize
is not lossless - see genozip for details.
Compressing faster, sacrificing a bit of compression ratio
genozip file.bam --fast
Encrypting (256 bit AES)
Converting SAM/BAM to FASTQ
- Generating a samtools/bcftools index file when uncompressing
genounzip file.bam.genozip --index
Calculating the MD5 of the underlying textual file (also included in –test)
genozip file.vcf --md5
genounzip file.vcf.genozip --md5
genols file.vcf.genozip
Compressing and then verifying that the compressed file decompresses correctly
genozip file.vcf --test
Questions? support@genozip.com