Archiving - using –tar

Using the genozip --tar option, genozip compresses files directly into a standard tar file.

Each file is compressed independently and written directly into a standard tar file as it is being formed. This is faster and consumes less disk space than first genozipping files and then packaging them into a tar file, since no separate .genozip files are created - just the tar file.

Example 1:

> # Compressing
> genozip --tar mydata.tar sample1.bam sample2.bam variants.vcf

> # Listing the contents of the tar file
> tar tvf mydata.tar
-rw-rw-rw- USER/USER   3424847 2021-06-01 11:34 sample1.bam.genozip
-rw-rw-rw- USER/USER   6765323 2021-03-04 22:04 sample2.bam.genozip
-rw-rw-rw- USER/USER    765323 2021-03-04 22:08 variants.vcf.genozip

> # Unarchiving and decompressing all files
> tar xvf mydata.tar |& genounzip --files-from - --replace

Example 2: compress all files in a directory and its sub-directories, using --subdirs:

> genozip --tar mydata.tar --subdirs my-data-dir

Example 3: compress and archive all BAM files in the current directory and its sub-directories, preserving the directory struture:

> find . -name "*.bam" | genozip --tar mydata.tar -T-

Implementation note: Genozip implements the IEEE 1003.1-1988 (“ustar”) standard of tar files, with the size field in binary format for files 8GB or larger. The GNU-tar LongLink extension is used for file names longer than 99 characters. This is compatible with most modern tar implementations, including GNU tar.

Note: up to v13, Genozip had a separate feature called “binding” that allowed binding several files of the same type to a single Genozip file. This feature has been discontinued as of v14 (except for FASTQ files compressed with –pair). To decompress files compressed with the discontinued binding feature, using Genozip v13.