Genozip compression is lossless relative to the underlying data, which means that the data reconstructed during decompression is exactly identical to the original data compressed.
Verification of Losslessness
When uncompressing with
genocat, these tools verify that the reconstructed data is exactly identical to the source data, using a digest. Some exceptions apply. See here: Verifying file integrity.
Exceptions to Losslessness
genozipto change the source data before compressing it. In these cases, the digest is not calculated. These cases are:
--optimizeor any of the
VCF: Generating Dual-coordinate VCF files with
VCF: Compressing a Luft file (a lifted-over dual-coordinates file)
genozipcompresses a BGZF-compressed file, it first decompresses it to recover the original underlying data, and then compresses the data with Genozip. Likewise, when
genounzipuncompresses a .genozip file, it recompresses the data back to BGZF if the original file was compressed with BGZF. Genozip records the parameters of the BGZF compression in the .genozip file (such as estimated compression level), and
genounzipattempts to recompress to BGZF format using the same parameters. However, since there are many BGZF compression libraries, each with dozens of parameter combinations, it is possible that
genounzipBGZF compression will achieve a slightly different compression level than the BGZF compression of the original file, resulting in the final BGZF-compressed file differing from the original BGZF file. However, the underlying data is still exactly identical and verified with the digest.
genozipis capable of compressing files that are already compressed with these methods. It does so by first uncompressing the file, and then recompressing with Genozip. However, in these cases,
genounzipdoes not recompress the files back to .bz2 .xz or .gz.