Genozip

A universal compressor for genomic files


About Genozip

Sign up to receive low-frequency updates related to Genozip.

Genozip is a universal compressor for genomic files - it is optimized to compress FASTQ, SAM/BAM/CRAM, VCF/BCF, FASTA, GFF3/GVF, PHYLIP, Chain, Kraken and 23andMe files, but it can also compress any other file (including non-genomic files).

Typically, a 2X-5X improvement over the existing compression is achieved when compressing already-compressed files like .fastq.gz .bam vcf.gz, and much higher ratios in some other cases.

Yes, Genozip can compress already-compressed files (.gz .bz2 .xz .bam .cram).

The compression is lossless - the decompressed file is 100% identical to the original file (some exceptions apply).

Genozip consists of four command line tools:

  • genozip compresses files

  • genounzip decompresses files

  • genols shows metadata of compressed files and directories

  • genocat is the workhorse for using genozip in analytical pipelines:
    • Display the contents of a compressed file - possibly piping it into a downstream tool

    • Subset a compressed file - show a specific part of its contents

    • Translate a compressed file to another format (eg BAM to FASTQ or Multi-FASTA to Phylip)

    • Analyze a compressed file (eg showing the sex, coverage or compression statistics)


Installing

From Conda (Linux & Mac):
conda config --add channels conda-forge
conda install genozip
Linux binaries (x86-64, statically linked, works on most Linux systems)
Windows installer:
Compile it yourself from Github (tested on Linux, Mac and Windows):
Download: latest release
make

Publications & Citing


Lan, D., et al. (2021) Genozip: a universal extensible genomic data compressor Bioinformatics, 37, 2225–2230

Lan, D., et al. (2020) genozip: a fast and efficient compression tool for VCF files Bioinformatics, 36, 4091–4092

Lan, D (2021) The Variant Call Format - Dual Coordinates Extension (DVCF) Specification doi:10.6084/m9.figshare.14685816 (preprint)


Follow me on ResearchGate or LinkedIn


Contact

Technical questions, bug reports and feature requests: support@genozip.com

Commercial license inquiries: sales@genozip.com

Requests for support for compression of additional public or proprietary file formats: sales@genozip.com


License

Genozip’s License allows for free non-commerical use, subject to certain conditions. For a commercial license, please contact sales@genozip.com.

THIS SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.