Display contents or metadata of a file compressed with genozip.

Usage: genocat [options]… [files]…

One or more file names must be given.

Reference-file related options

-e, --reference filename.  Load a reference file prior to decompressing. Required only for files compressed with --reference. When no non-reference file is specified display the reference data itself (typically used in combination with --regions).

-E, --REFERENCE filename.  With no non-reference file specified. Display the reverse complement of the reference data itself. Typically used in combination with --regions.

--show-reference  Show the name and MD5 of the reference file that needs to be provided to uncompress this file.

Subsetting options (options resulting in modified display of the data)

--downsample rate[,shard].  Show only one in every <rate> lines (or reads in the case of FASTQ), optional <shard> parameter indicates which of the shards is shown. Other subsetting options (if any) will be applied to the surviving lines only.

--interleaved  For FASTQ data compressed with --pair: Show every pair of paired-end FASTQ files with their reads interleaved: first one read of the first file ; then a read from the second file ; then the next read from the first file and so on.

-r, --regions [^]chr|chr:pos|pos|chr:from-to|chr:from-|chr:-to|from-to|from-|-to|from+len[,...].  (FASTA SAM/BAM GVF 23andMe Chain) Show one or more regions of the file. Examples:

genocat myfile.vcf.genozip -r 22:1000-2000

Positions 1000 to 2000 on contig 22

genocat myfile.sam.genozip -r 22:1000+151

151 bases, starting pos 1000, on contig 22

genocat myfile.vcf.genozip -r -2000,2500-

Two ranges on all contigs

genocat myfile.sam.genozip -r chr21,chr22

Contigs chr21 and chr22 in their entirety

genocat myfile.vcf.genozip -r ^MT,Y

All contigs, excluding MT and Y

genocat myfile.vcf.genozip -r ^-1000

All contigs, excluding positions up to 1000

genocat myfile.fa.genozip  -r chrM

Contig chrM

Note: genozip files are indexed automatically during compression. There is no separate indexing step or separate index file.

Note: Indels are considered part of a region if their start position is.

Note: Multiple -r arguments may be specified - this is equivalent to chaining their regions with a comma separator in a single argument.

Note: For FASTA and Chain files, only whole-contig regions are possible.

Note: For Chain files this applies to the source contig (qName).

-s, --samples [^]sample[,...].  (VCF) Show a subset of samples (individuals). Examples:

genocat myfile.vcf.genozip -s HG00255,HG00256

show two samples

genocat myfile.vcf.genozip -s ^HG00255,HG00256

show all samples except these two

Note: This does not change the INFO data (including the AC and AN tags).

Note: Sample names are case-sensitive.

Note: Multiple -s arguments may be specified - this is equivalent to chaining their samples with a comma separator in a single argument.

-g, --grep string.  (FASTQ FASTA) Show only records in which <string> is a case-sensitive substring of the description.

-G, --drop-genotypes.  (VCF) Output the data without the samples and FORMAT column.

-H, --no-header.  Don't output the header lines.

-1, --header-one.  (VCF FASTA) VCF: Output only the last line on the header (the line with the field and sample names). FASTA: Output the sequence name up to the first space or tab.

--header-only.  Output only the header lines.

--GT-only.  (VCF) Within samples output only genotype (GT) data - dropping the other subfields.

--sequential.  (FASTA) Output in sequential format - each sequence in a single line.

Analysis options

--list-chroms.  (VCF SAM BAM FASTA GVF 23andMe) List the names of the chromosomes (or contigs) included in the file.

--show-sex.  (SAM BAM) Determine whether a SAM/BAM is a Male or a Female. See "Sex assignment" use case.

--show-coverage[=all].  (SAM BAM) Shows the coverage and depth of each contig. Without =all it shows only contigs that are chromosomes and groups the other contigs under "Other contigs". See "Coverage and Depth" use case.

--show-coverage-chrom.  (SAM BAM) Same as --show-coverage but shows only contigs that are chromosomes.

Translation options (convertion from one format to another)

--bam  (SAM and BAM only) Output as BAM. Note: this option is implicit if --output specifies a filename ending with .bam

--sam  (SAM and BAM only) Output as SAM. This option is the default in genocat on SAM and BAM data.

--no-PG  (SAM and BAM only) When converting a file from SAM to BAM or vice versa Genozip normally adds a @PG line in the header. With this option it doesn't.

--fastq  (SAM and BAM only) Output as FASTQ. The alignments are outputted as FASTQ reads in the order they appear in the SAM/BAM file. Alignments with FLAG 16 (reverse complimented) have their SEQ reverse complimented and their QUAL reversed. Alignments with FLAG 4 (unmapped) or 256 (secondary) are dropped. Alignments with FLAG 64 (or 128) (the first (or last) segment in the template) have a '1' (or '2') added after the read name. Usually (if the original order of the SAM/BAM file has not been tampered with) this would result in a valid interleaved FASTQ file. Note: this option is implicit if --output specifies a filename ending with .fq[.gz] or .fastq[.gz]

--bcf  (VCF only) Output as BCF. Note: bcftools needs to be installed for this option to work.

--phylip  (FASTA only) Output a Multi-FASTA in Phylip format. All sequences must be the same length.

--fasta  (Phylip only) Output as Multi-FASTA.

--vcf  (23andMe only) Output as VCF. --vcf must be used in combination with --reference to specify the reference file as listed in the header of the 23andMe file (usually this is GRCh37). Note: INDEL genotypes ('DD' 'DI' 'II') as well as uncalled sites ('--') are discarded.

General options

-c, --stdout  Send output to standard output instead of a file.

-f, --force  Force overwrite of the output file.

-z, --bgzf level.  Compress the output to the BGZF format (.gz extension) using libdeflate at the compression level specified by the argument. Argument specifies the compression level from 0 (no compression) to 12 (best yet slowest compression). If you are not sure what value to choose - 6 is a popular option. Note: by default (absent this option) genozip will attempt to re-create the same BGZF compression as in the original file. Whether genozip succeeds in re-creating the exact same BGZF compression ratio depends on the compression library used by the application that generated the original file.

-^, --replace  Replace the source file with the result file rather than leaving it unchanged.

-o, --output output-filename.  Output to this filename.

-p, --password password.  Provide password to access file(s) that were compressed with --password.

-x, --index  Create an index file alongside the decompressed file. The index file is created as described:

Data type

Tool used


samtools index


samtools faidx


samtools faidx


bcftools index

Other types

--index not supported

-q, --quiet  Don't show the progress indicator or warnings.

-Q, --noisy  The --quiet option is turned on by default when outputting to the terminal. --noisy stops the suppression of warnings.

-@, --threads number.  Specify the maximum number of threads. By default genozip uses all the threads it needs to maximize usage of all available cores.

-w, --show-stats   Show the internal structure of a genozip file and the associated compression statistics.

-W, --SHOW-STATS   Show more detailed statistics.

--validate  Validates that the file(s) are valid genozip files.

-h, --help[=topic]  Show this help page. Optional topic can be:



list of genozip options


list of genounzip options


list of genocat options


list of genols options


list of developer options


list of possible arguments of –input

-L, --license, --licence  Show the license terms and conditions for this product.

-V, --version  Display Genozip's version number.