Genozip

A universal genomic file compressor - for BAM, FASTQ, VCF and other genomic files

Genozip is a universal compressor for genomic files - FASTQ, BAM/CRAM, VCF and many other file formats (including non-genomic files).

For Illumina data .bam and .fastq.gz files, the typical gain over gzip is around 4X. For PacBio and Oxford Nanopore data aligned .bam files, the gain is typically around 2X. For .vcf.gz files, the gain over gzip is typically 3-6X. Here are some examples.

Yes, Genozip can compress already-compressed files (.gz .bz2 .xz .bam .cram).

The compression is lossless - the decompressed file is 100% identical to the original file (some exceptions apply).

Genozip consists of four command line tools:

  • genozip compresses files

  • genounzip decompresses files

  • genols shows metadata of compressed files and directories

  • genocat is the workhorse for using genozip in analytical pipelines:
    • Display the contents of a compressed file

    • Subset a compressed file - show a specific part of its contents

    • Translate a compressed file to another format (eg BAM to FASTQ)

    • Analyze a compressed file (eg showing the sex or coverage)


Installing

From Conda (Linux & Mac):
conda config --add channels conda-forge
conda install genozip
Linux binaries (x86-64, statically linked, works on most Linux systems)
Windows installer:
Compile it yourself from Github (tested on Linux, Mac and Windows):
Download: latest release
make

License

Genozip is a paid professional product (Pricing), provided under this license. Genozip is also available free of charge for academic and training use (see FAQ).


Contact

Technical questions, bug reports and feature requests: support@genozip.com

Subscription inquiries: sales@genozip.com


Publications & Citing

Lan, D., et al. (2021) Genozip: a universal extensible genomic data compressor Bioinformatics, 37, 2225–2230

Lan, D., et al. (2020) genozip: a fast and efficient compression tool for VCF files Bioinformatics, 36, 4091–4092

Lan, D (2021) The Variant Call Format - Dual Coordinates Extension (DVCF) Specification doi:10.6084/m9.figshare.14685816 (preprint)



THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS, COPYRIGHT HOLDERS OR DISTRIBUTORS OF THIS SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.