Release NotesΒΆ

Note on versioning:
- Major version change occurs when the Genozip file format is 
extended. Note that a new Genozip version can always 
uncompressed files generated by an older Genozip version 
(Genozip is backward-compatible starting v8)
- Minor version changes with bug fixes and minor feature 
- Some minor versions are skipped due to failed deployment 

13.0.20 13/July/2022
- Bug fixes

13.0.19 19/June/2022
- Bug fixes
- BAM: Better support for OQ:Z 

13.0.18 21/May/2022
- Bug fixes

13.0.17 18/May/2022
- Bug fixes

13.0.16 7/April/2022
- Bug fixes

13.0.15 31/March/2022
- --tar: added support for long filenames

13.0.14 29/March/2022
- Support --tar, where file uids are very large - e.g. when 
pulled from Active Directory using SSSD

13.0.13 25/March/2022
- Bug fixes

13.0.12 8/March/2022
- Bug fixes

13.0.11 23/Jan/2022
- VCF: Better compression of FORMAT/PS and FORMAT/PID

13.0.10 22/Jan/2022
- Bug fixes

13.0.9 21/Jan/2022
- Bug fixes

13.0.8 8/January/2022
- SAM/BAM/FASTQ: Better compression of BGI read names
- New advanced option: --debug-qname

13.0.7 2/January/2022
- SAM/BAM: bowtie2: Better compression of AS and YS fields
- Display warning if excessive dictionary size
- Fix bug where genounzip sometimes output uncompressed files 
when a .gz-recompression was expected

13.0.6 14/December/2021
- Better compression of QNAME (SAM/BAM/FASTQ/Kraken)
- Improve memory effeciency when compressing BAM integer 
arrays (eg B:i)
- New advanced option: --show-ref-diff - see the difference 
between two Genozip reference files.
- New LongR codec for quality scores introduced in 13.0.5 now 
used also in default mode (not just in --best as before)

13.0.5 3/December/2021
- SAM/BAM, FASTQ: Better compression of PacBio and Nanopore 
quality scores (in --best mode)
- VCF: Better compression for FORMAT/PS, FORMAT/GT, FORMAT/GQ, 
- Illumina formats: Better compression for .locs files
- --multiseq (renamed from --multifasta) now works on FASTQ 
files too, in addition to FASTA

13.0.4 19/November/2021
- Better support for EasyBuild installations - see
- new option: --subdirs to recursively compress subdirectories
- SAM/BAM: Better compression of CP:i
- Performance enhancements

13.0.3 15/November/2021
- VCF: better compression of FORMAT/GL in --best
- DVCF: more granular statuses, canceled --ext-ostatus
- CRAM: now compresses as BAM instead of SAM

13.0.2 8/November/2021
- Native (binary) handling of BAM integers --> faster 
compression, up to 2X faster in BAM files with long arrays (eg 
PacBio subreads, IonTorrent)
- Advanced option: --biopsy

13.0.1 4/November/2021
- Significant speed improvement in compression and 
decompression due to faster dict_id->did_i mapping, 
elimination of 
  thread synchronization bottlenecks and integration of fast 
- Faster --fast mode, and better --best mode
- SAM/BAM compression improvements: Better compression of 
MAPQ, MQ:i, XS:i, TLEN, CIGAR, ms:i (biobambam)
- VCF compression improvements: better compression for sample 
fields GL,PP,PL,PRI,GP,DS,AD and files generated by VarScan.
- VCF: Liftover fidelity improvements in DVCF
- VCF: genocat --indels-only now also excludes variants which 
have an INFO/SVTYPE field 
- FASTA compression improvements: Better compression of amino 
acid sequences
- GFF/GFF3: Support more variations of the format, including 
output of Maker and GFF (not only GFF3) output of Ensembl 
- Accept human contig names eg NC_000001.10 as equivalent to 1 
or chr1 (and similarly for human chromosomes 1-22,X,Y)
- Rename advanced option --debug-allthesame -> --debug-generate
- Advanced option: --show-containers can now accept an 
argument for additional output
- Much faster --make-reference

12.0.42 13/October/2021
- SAM/BAM: better compression for Z5:i, XM:i
- bug fixes

12.0.41 12/October/2021
- SAM/BAM: significantly better compression for sorted (by 
POS) files 

12.0.37 28/September/2021
- SAM/BAM: better NM:i, MD:Z, MQ:i, QNAME compression
- FASTQ: --optimize-DESC now generates read names similar to 
the NCBI format, eg "@sample.6"
- New advanced option: --show-wrong-md

12.0.36 24/September/2021
- New: genozip --match-chrom-to-reference: rewrite contig 
names to match the provided reference file (eg "22"->"chr22"), 
see https://genozip/match-chrom.html
- Chain files can now be subsetted with --regions
- More accurate progress indicator
- Better support for contigs named by accession number, eg 
"GL000192.1", "chrUn_JTFH01001867v2_decoy", "chr4_gl383528_alt"
- New advanced option: --debug-seg
- Advanced option --show-ref-alts renamed --show-chrom2ref
- faster --downsample when used with very large values 
- Relax section 2f of the license
- Removed obsolete --with-chr
- Bug fixes

12.0.34 9/September/2021
- Compile without -march=native for genozip-linux-x86_64 and 
Windows Installer distributions
- Better support for IUPAC "bases" in reference files
- DVCF: detect many-to-one coordinate mapping and generate an 
.overlaps file
- VCF: better INFO/ANN, INFO/CLNHGVS, INFO/CSQ compression
- SAM/BAM/FASTQ/FASTA/Kraken: better compression of QNAME / 
- SAM/BAM: Faster compression and decompression of SA, OA, XA 
with long reads 
- Faster --test
- More robust compression from a URL, even on flakey 
- Improved --STATS report
- bug fixes

12.0.33 31/August/2021
- Fix backward compatability issue of decompressing FASTQ 
files compressed with --pair in v9.0.12 or earlier 
- FASTQ: genocat --seq-only and genocat --qual-only: output 
only the Sequence / Quality lines
- DVCF:  genocat --single-coord : Generates single-coordinate 
("normal") VCF. Can be used with or without --luft
- DVCF:  genocat --contigs --luft: show the contigs of the 
Luft coordinate 
- FASTA: support --make-reference when contigs in FASTA are 
sequential (i.e. not broken into short lines)
- SAM/BAM/FASTQ/FASTA/Kraken: better compression of QNAME / 
- SAM/BAM/FASTQ: Faster compression of long reads
- bug fixes

12.0.32 26/August/2021
- Faster loading of reference files

12.0.31 23/August/2021
- Better GFF3 compression
- Many minor bug fixes and cosmetics

12.0.30 15/August/2021
- Extended the genocat --fastq option for converting SAM/BAM 
to FASTQ: now --fastq=all emits all the SAM/BAM fields in the 
FASTQ description lines.

12.0.26 13/August/2021
- FASTA improvements: compression improvement for amino acid 
FASTAs;  support --downsample ; support .fas and .frn filename 

12.0.25 9/August/2021
- DVCF: added tag renaming ; RengAlg attribute of ##INFO and 
##FORMAT now enclosed in quotes. 
- VCF: better compression for INFO/CLNDN, INFO/CLNHGVS, 
- When loading implicit reference files with a relative path, 
first try path relative to current directory, and then 
relative to file's directory.

12.0.14 29/July/2021
- FASTQ: support files where the 3rd line is a copy of the 1st 
line, except with '+' prefix instead of '@'
- DVCF: a. lift-over of complex indels ; b. consistent sorting 
of lines that have the same CHROM/POS c. alt chrom names also 
if VCF has no contigs in header
- snips with len > 512K (technical improvement)

- Option change: --grep for fastq now tests the entire read, 
not just the description
- Option change: --interleaved now has an optional paramter 
--interleave=either or --interleave=both (default: both) 
describing how to handle in case of subsetting with eg --grep

12.0.12 24/July/2021
- Better support for compressing gff3 files

12.0.10 + 12.0.11 22/July/2021
- bug fixes

12.0.7 + 12.0.8 16/July/2021
- license and copyright update

12.0.6 15/July/2021
- new option: --tar to archive with genozip. See:
- new option: --files-from/-T - An alternative to providing 
input file names on the command line
- bug fixes

12.0.5 9/July/2021
- Compressing low-coverage SAM/BAM files without a reference: 
better compression ratio, better decompression performance

12.0.4 8/July/2021
- safer implementation of --replace
- updated non-commercial license
- bug fixes

12.0.3 6/July/2021
- new option: --licfile. See

12.0.2 2/July/2021
a. Dual-coordinate VCF files: genozip --chain to create a 
dual-coordinates file ; genocat --luft ("lifted") to see
b. Filtering by taxonomy using kraken2 files - support for 
compressing kraken output files --kraken, --taxid, see:
c. Many other improvements:
- Support genocat --sort for VCF files ; --sort implied for 
dual-coordinates VCF files. Disabled by --unsorted.
- Fixed bug with subsetting samples in VCF (genocat --sample). 
The fix will work on files compressed with 11.0.9 onwards.
- Now, genounzip always unbinds files, and genocat always 
d. New options:
- new option: genocat --component <component-number> - to view 
a single component of a bound file (including one of the fastq 
files in a paired file)
- new option: genocat --lines [start]-[last] - show a subset 
of lines of the file.
- new option: genocat --head [num_lines] - show lines from the 
start of the file.
- new option: genocat --tail [num_lines] - show lines from the 
end of the file.
- new option: genocat --count - displays the number of lines 
(or reads in the case of FASTQ), that
  survived any filters applied (--regions, --grep, --taxid, 
--FLAGS, --MAPQ, --bases, --component, --one-vb etc)
- new option: --show-filename - Show the file name for each 
- new option: genocat --FLAG - filters a SAM or BAM file by 
the value of the FLAG field
- new option: genocat --MAPQ - filters a SAM or BAM file by 
the value of the MAPQ field
- new option: genocat --bases - filters a SAM/BAM/FASTQ for 
SEQ base values
- new option: --echo displays the command line and a timestamp 
upon completion ofq execution (successful or failed)
- new option: genocat --regions-file - reading regions from as 
an alternative to --regions
- new option: genocat --show-chain - for chain files - show 
chain file alignments
- new option: genocat --show-chain-contigs - for chain files - 
show contig list
- new option: genocat --with-chr - for chain files - changes 
eg 22->chr22 and MT->chrM for all qNames  
- discontinued support for GTShark codec - use genozip v11 to 
decompress old VCF files compressed with --gtshark
- VCF: better compression of FORMAT/F2R1, INFO/MLEAC, INFO/AA, 
e. Option changes:
- genozip - resturctured optimization options for VCF: 
--optimize-phred, --GL-to-PL, --GP-to-PP
- genounzip now always unbinds files, and the --unbind is 
canceled. the --prefix can now used to set a prefix.
- genocat --fastq will NOT add /1 or /2 to R1 and R2 reads in 
the case that --FLAG is specified as well
- genocat --grep now works with most data types, --grep-w 
restrict to whole words
- genocat --samples now also accepts a number - "--samples 5" 
shows the first 5 samples
- genocat --validate=valid displays filenames that are valid 
genozip files. No change when used without "=valid".
- genols --unbind (-u) option is renamed --list (-l). genozip 
--list option is canceled.
- --sex, --coverage, --stats and --STATS replace --show-sex, 
--show-coverage --show-stats, --SHOW-STATS respectively
- --chroms, and --contigs are accepted as alternative names 
for --list-chroms
- Setting the environment variable GENOZIP_REFERENCE is now 
equivalent to --reference
- Default number of threads is now 75% of cores for Windows 
and Mac and 110% of cores for Linux (modifiable with --threads)
f. New advanced options:
- new option: genocat --show-dvcf - shows line-by-line result 
of the liftover (applied to a dual coordinate VCF file)
- new option: genozip --show-kraken - used in combination with 
- new option: --show-uncompress. Shows uncompressing of 
section data.
- new option: --show-flags. Shows internal flags after 
- new option: --show-plan. Shows reconstruction plan
- new option: --show-ref-iupacs. Show non-ACGT iupac codes in 
a reference file
- new option: --debug-stats. For debugging development of 
- new option: --debug-allthesame. For debugging development of 
the allthesame algorithm
g. Much improved website

11.0.11 24/March/2021
- Fix bug with concatenating multiple files with genocat
- genocat <file1> <file2>.... now will show the header of only 
the first file. To show headers of all files use 
  genounzip --stdout instead.

11.0.9  20/March/2021
- Fix bug with --downsample in combination with --interleaved

11.0.8  9/March/2021
- Added sharding with genocat --downsample <rate>,<shard>
- Added support for compressing UCSC chain files
- Better --show-coverage
- Bug fixes

11.0.7  5/March/2021
- Added genozip --idxstats - identical output to samtools 
- Much improved genocat --show-sex and --show-coverage
- Bug fixes

11.0.6  2/March/2021
- Added genocat --show-coverage and --show-coverage-chrom
- Added "Male-XXY" result to genocat --show-sex
- Added genocat --validate
- Better hash table sizing algorithm - reduced memory 
- Developer tools: Added <bytes> option to 
- Developer: add kill -USR1 - --show-memory of a running 
- Bug fixes

11.0.5  27/Feb/2021
- Added genocat --show-sex for sex assignment of a SAM/BAM file
- Improve Windows installer
- Windows: add genozip directory to Path in registry, if not 
already there
- Bug fixes

11.0.4  20/Feb/2021
- bug fixes

11.0.3  20/Feb/2021
- Added registration requirement to the non-commerical license 
- Bug fixes
- windows installer relocated from windows/ to docs/

11.0.2  13/Feb/2021
- Bug fixes

11.0.0  11/Feb/2021
- VCF: introduce a PBWT based codec for compression of the 
haplotype matrix. Retire hapmat and gtshark codecs. 
  backward compatability is providing for decompressing VCF 
files compressed in earlier versions of genozip with hapmat or 
- SAM: better handling of optional fields SA, OA, XA
- Better memory management in Linux
- Reduce core oversubscription from 1.4 to 1.2
- Add --multifasta option for better compression of a FASTA 
where the contigs are quite similar to each other

10.0.9  11/Jan/2021
- bug fixes

10.0.8  10/Jan/2021
- VCF: better handling of INFO/SF

10.0.5  8/Jan/2021
- VCF: better handling of FORMAT fields DP, AD, ADF, ADR, 
AD_ALL, PL and INFO fields DP, BaseCounts

10.0.4  8/Jan/2021
- VCF: better handling of FORMAT/DS
- Bug fixes

10.0.3  7/Jan/2021
- Better --gtshark mode for VCF

10.0.2  7/Jan/2021
- Better memory usage in ZIP (canceled Context.node_i)
- Better handling of VCF haplotype matrices with hetreogeneous 
- Bug fixes

10.0.0  31/Dec/2020
- Increased MAX_FIELDS from 64 to 2048. This sets the maximum 
number of INFO and FORMAT tags in VCF, 
  maximum number of optional fields in SAM/BAM and maximum 
- Set size of vblock dynamically
- VCF: support FORMAT tags that begin with a character other 
than a letter (eg a digit)
- VCF: better handling of INFO arrays
- VCF: better handling of VEP fields: CSQ, DP_HIST, GQ_HIST, 
- VCF: better handling of FORMAT/DP and FORMAT/GQ - transposed 
- Several other bug fixes
- Backward compatible with Genozip 8 and 9 - v8 and 
v9-compressed files can be read by v10

9.0.22  28/Dec/2020
- more consistent --bgzf, --sam, --bam behavior in genocat
- better --stats
- minor bug fixes

bug fixes, including major bug with mc:i optional field in SAM

9.0.20  27/Dec/2020
- allow --output to a named pipe (fifo) (not available on 
- genounzip --bgzf now requires a level parameter (0 to 12). 0 
means no compression, and hence --plain flag is canceled.
- bug fixes 

9.0.17  20/Dec/2020
- refactor access to the reference file - to using memory 
mapping and cache files - a lot faster and consumes less memory
- when compressing a .gz (or BAM), test BGZF blocks against 
zlib too (with all compression levels), in addition to 
- append /1 and /2 to the qname in fastq files in both 
--interleave of a paired fastq file and --fastq of a sam/bam 
- renamed --test-seg to --seg-only
- bug fixes 

9.0.15  14/Dec/2020
- bug fixes and minor improvements

9.0.14  12/Dec/2020
- added better selection of stdout vs stderr for messages 
- bug fixes

9.0.13  10/Dec/2020
- added genocat --interleave: displays pairs of FASTQ files 
compressed with --pair with their reads interleaved.

9.0.12  8/Dec/2020
- bug fix

9.0.11  7/Dec/2020
- Fixed critical bug introduced in 9.0.0 in which FASTQ files 
that were compressed with BGZF (i.e. fq.gz),
  and genozipped with --pair, did not compress correctly
- Added Phylip data type
- genozip --pair       can now compress any number of fastq 
files - every 2 consecutive files are considered a pair
- genocat --header-one now works of FASTA too: Output the 
sequence name up to the first space or tab
- genocat --phylip     new translator - outputs a multi-fasta 
file in Phylip format
- genocat --fasta      new translator - outputs a Phylip file 
in multi-fasta format
- bug fixes
- Developer options:
    --xthreads Use only 1 thread for the main PIZ/ZIP 
dispatcher. This doesn't affect thread use of other dispatchers
    --show-headers now accepts a section-type as an optional 
9.0.10  2/Dec/2020
- Added the --index option for genounzip / genocat to create 
an index file alongside the decompressed file

9.0.8-9  2/Dec/2020
- bug fixes

9.0.7  1/Dec/2020
- New flags:
    genocat --downsample <rate> - show only one in every X 
lines (or reads)
    genocat --one-vb <vb>       - show data from a single VB
- bug fixes

9.0.1-6 1/Dec/2020
- bug fixes and minor improvements

9.0.0 29/Nov/2020
- Native compression of BAM (no longer using samtools for BAM)
- Native reading and writing of BGZF data
- New data type: "generic" for compressing any file beyond our 
supported genomic formats
- Framework supports file translations SAM->BAM, BAM->SAM, 
- Framework supports binary source files
- Backwards compatible with v8 - v8-compressed files can be 
read by v9
- When decompressing a file that was originally compressed 
with BGZF (eg BAM, fq.gz...) - the BGZF blocks are 
  with an attempt to guess the original compression level
- File is now always verified - if md5 is not selected, then 
Adler32 is used

- New / changed flags: 
    --sam (new flag) for genounzip/genocat - reconstruct a 
SAM/BAM file as SAM
    --bam (new flag) for genounzip/genocat - reconstruct a 
SAM/BAM file as BAM
    --no-PG (new flag) refrain from adding a @PG record to the 
header when converting SAM->BAM or BAM->SAM
    --fastq (new flag) for genounzip/genocat - reconstruct a 
    --vcf (new flag) genounzip/genocat - reconstruct a 23andMe 
file as a VCF
    --plain (new flag) in genounzip / genocat - negates 
implicit --bgzf
    --dump-local and --dump-b250 (renamed from dump-one-local 
and dump-one-b250) now output a file per VB
    --bytes (new flag) for genols - show sizes in bytes
    --dump-section (new flag)
    --show-bgzf (new flag) for genozip - show bgzf blocks
    --show-containers (new flag) for genounzip/genocat - show 
flow of container reconstruction
    --show-time can now accept an optional argument eg. 
    --show-txt-contigs - shows contigs from the SAM/BAM header 
(SQ lines)
    --show-mutex - shows locks and unlocks of a particular 
    --unbind in genols (new flag) - shows the components of 
bound files
    --show-dict and show-b250 now accept an optional paramter 
+ removed --show-one-dict and --show-one-b250
    --show-digest show (md5 or Adler32) updates
    --stdout - flag canceled for compression (genozip), 
available for decompression (genounzip, genocat, genozip -d)
    --input - renamed from --input-type
Compression improvements:
- For b250 sections that have all the same entry - store the 
entry only once. If the entry is word_index=0, drop the section
- Improvements in codec assignment algorithm, and use it for 
dictionary and some other section types in addition to b250 
and local
- 30% improvment in dictionary size of disk due to 
consolidation of fragments and codec assignment.
- Multi-threaded decompression of dictionaries.
- Speed improvements by having bsc and zlib use libdeflate's 
version of adler32 and crc32

- removed support for Visual C compiler

8.0.4 8/Nov/2020
- 10X improvement in --gtshark speed by moving to in-memory 
comms using fifo
- fix thread safety issue in bit_array.c 

8.0.3 23/Oct/2020
- Support samtools with or without --no-PG
- Fix reading and writing BAM files using samtools
- Fix bug in genocat --show-headers
- Add back gtshark as a codec for VCF allele data, --gtshark 

8.0.2 20/Oct/2020
- Bug fixes
- Improved 'genocat --show-headers'

8.0.0 16/Oct/2020
- Added libbsc codec
- Dynamic selection of codec between lzma, bz2, bsc for each 
local and b250 buffer
- --show-ref-seq can now work in combination with --regions in 
- Better license registration flow
- Consume ~0.5GB (for human data) less RAM in genounzip of SAM 
files compressed without a reference 
- In --regions, allow specification or ranges using length eg 
"chr22:1000+151" - equivalent to "chr22:1000-1150"
- Canceled optimize-SEQ (benefits were tiny if any, but it 
slowed down --optimize considerably)
- Added --best to contrast --fast. --best doesn't have any 
additional effect as its the default mode of genozip.
- Added =prefix option to --unbind, to add a prefix when 
- --reference in genounzip is now optional - will use original 
reference filename absent --reference
- Not backward compatible

7.0.5 10/Oct/2020
- Add --show-stats and --SHOW-STATS to genocat/genounzip by 
introducing a new section SEC_STATS ; remove limitation of 
only one file when -w or -W

7.0.4 4/Oct/2020
- Bug fixes

7.0.3 3/Oct/2020
- Bug fixes

7.0.2 2/Oct/2020
- Even better SAM BD/BI codec

7.0.1 29/Sep/2020
- bug fixes
- new --test-seg debug option
- change default number of threads to 1.4 * number of cores

7.0.0 28/Sep/2020
- Re-write the VCF segmenter to use the modern infrastructure 
of recursive data definition. In the process, some little-used
  features were discontinued: --gtshark, --sblocks. Non-GT 
subfields are now compressed as is (not transposed), and each  
  field on its own. Samples as well as the GT field are 
defined as Structured.
- Removed gloptimization - too small of a benefit for 
non-standard code
- Change all data types to be fully recursive starting at 
TOPLEVEL, removing data-type specific reconstruction loop
- Added caching of Structured in PIZ
- Better BD and BI compression for SAM
- Not backward compatible
- Bug fixes

6.0.11 21/Aug/2020
- Bug fixes

6.0.3 19/Aug/2020
- Added new data type for reference files - and an option for 
creating a reference file from a FASTA - --make-reference
- Added compression against reference for FASTQ, SAM and VCF - 
new options --reference and --REFERENCE
- Added --pair to compresses pairs of paired-end fastq files 
together, resulting in significantly better compression
- Added Domqual compression method, for handling dominant 
quality scores such as Illumina binned quality scores in FASTQ 
and SAM
- Added ACGT compression codec for nucleotide sequences
- Added support for compressing CRAM files
- Added better compression for FORMAT/PS, INFO/AC, INFO/AF, 
- Added --optimize-DESC for FASTQ optimization
- Added --optimize-SEQ for FASTQ, FASTA, SAM optimization
- Added many options including --list-chroms, 
--dump-one-local, --show-reference, --show-ref-index, 
--show-ref-seq, --show-chrom2ref,
  --show-ref-contigs, --show-ref-hash
- Removed backward compatability with versions v1 and v5. Use 
genozip version 5 to decompress files of all previous versions.

5.0.9 16/June/2020
- fix bug with compressing VCF / GVF with an INFO / ATTRS 
field of '.'

5.0.7 2/June/2020
- bug fixes

5.0.5 31/May/2020
- Updated license
- Added user registration
- Added full support for compressing SAM/BAM, FASTQ, FASTA, 
GVF and 23andMe files
- Compression improvements for VCF files with any of these:
    1. lots of non-GT FORMAT subfields 
    2. ID data 
    3. END INFO subfield 
    4. MIN_DP FORMAT subfield
- Added genounzip output options: --bcf for VCF files and 
--bam for SAM files
- Added --input-type - tell genozip what type of file this is 
- if re-directing or file has non-standard extension
- Added --stdin-size - tell genozip the size of a redirected 
input file, for faster execution
- Added --show-index for genounzip and genocat - see index 
embedded in a genozip file
- Added --fast option for (a lot) faster compression, with 
(somewhat) reduced compression ratio
- Added --grep for genocat FASTQ
- Added --debug-progress and --show-hash, useful mostly for 
genozip developers
- Reduce default vblock from 128MB to 16MB
- Cancel option --strip
Note: some versions numbers are skipped due to failed conda 
builds (every build attempt consumes a version number)

4.0.11 30/March/2020
- bug fixes

4.0.10 28/March/2020
- updated license
- added --header-one to genocat
- query user whether to overwrite an existing file
- better error messages when running external tools
- bug fixes

4.0.9 27/March/2020
- improve performance for --samples --drop-genotypes --gt-only 
--strip and --regions - skip reading and decompressing
  all unneeded sections (previously partially implemented, now 
- bug fixes
4.0.6 25/March/2020
- bug fixes

4.0.4 24/March/2020
- add support for compressing a file directly from a URL
- remove support for 32-bit Windows (its been broken for a 

4.0.2 23/March/2020
- genozip can now compress .bcf .bcf.gz .bcf.bgz and .xz files
- genounzip can now de-compress into a bgzip-ed .vcf.gz file

4.0.0 21/March/2020
- a bug that existed in versions 2.x.x and 3.x.x, related to 
an edge case in compression of INFO subfields. 
  fixing the bug resulted in the corrected intended file 
format that is slightly different than that used in v2/3.
  Because of this file format change, we are increasing the 
major version number. Backward compatibility is provided
  for correctly decompressing all files compressed with v2/3.

- VCF files that contain lines with Windows-style line ending 
\r\n will now compress losslessly preserving the line 

3.0.12 20/March/2020
- added genocat --GT-only

3.0.11 20/March/2020
- added genocat --strip

3.0.9 19/March/2020
- bug fixes

3.0.2 18/March/2020
- changed default number of sample blocks from 1024 for 
non-gtshark and 16384 in gtshark to 4096 for both modes.
- bug fixes

3.0.0 17/March/2020
- added --gtshark allowing the final stage of allele 
compression to be done with gtshark (provided it is installed
  on the computer an accessible on the path) instead of the 
default bzlib. This required a change to the genozip 
  file format and hence increment in major version. As usual, 
genozip is backward compatible -
  newer versions of genozip can uncompress files compressed 
with older versions.

2.1.4 16/March/2020
- rewrote the Hash subsystem - 
  (1) by removing a thread synchronization bottleneck, genozip 
now scales better with number of cores (esp better in files 
with very large dictionaries)
  (2) more advanced shared memory management reduces the 
overall memory consumption of hash tables, and allows to make 
them bigger - improving speed
- --show-sections now shows all dictionaries, not just FORMAT 
and INFO
- --added optimization for VQSLOD

2.1.3 14/March/2020
- Fixed bug in optimization in GL in --optimize

2.1.2 13/March/2020
- Added --optimize and within it optimization for PL and GL

2.1.1 12/March/2020
- Reduced thread serialization to improve CPU core scalability
- New developer options --show-threads and --debug-memory
- Many bug fixes
- Improved help text

2.1.0 9/March/2020
- Rewrote VCF file data reader to avoid redudant copies and 
passes on the data
- Moved to size-constained rather than number-of-lines 
constrained variant blocks - change in --vblocks logic.
- Make MD5 calculation non-default, requires --md5. genounzip 
--test possible only if file was compressed with --md5
- Improved memory consumption for large VCFs with a single or 
small number of samples

2.0.0 6/March/2020
- New genozip file format 
- backward compatibility to decompress genozip v1 files
- Columns 1-9 (CHROM to FORMAT) are now put into their own 
dictionaries (except REF and ALT that are compressed together)
- Each INFO tag is its own dictionary
- --vblock for setting the variant block size
- Allow variant blocks larger than 64K and set the default 
variant block size based on the number of samples to balance 
  compression ratio with memory consumption.
- --sblock for setting the sample block size
- change haplotype permutation to keep within sample block 
- create "random access" (index) section 
- new genozip header section with payload that is list of all 
sections - at end of file
- due to random access, .genozip files must be read from a 
file only and can no longer be streamed from stdin during 
genounzip / genocat
- all dictionaries are moved to the end of the genozip file, 
and are read upfront before any VB, to facilitate random 
- genocat --regions to filter specific chromosomes and 
regions. these are accessed via random access
- genocat --samples to see specific samples only
- genocat --no-header to skip showing the VCF header
- genocat --header-only to show only the VCF header
- genocat --drop-genotypes to show only columns CHROM-INFO
- Many new developer --show-* options (see genozip -h -f)
- Better, more compressable B250 encoding
- --test for both genozip (compressed and then tests) and 
genounzip (tests without outputting)
- Support for --output in genocat
- Added --noisy which overrides default --quiet when 
outputting to stdout (using --stdout, or the default in 
- --list can now show metadata for encrypted files too
- Many bug fixes, performance and memory consumption 

1.1.3 7/Feb/2020
- --unbind option - required storing the VCF header of all 
files, and keeping md5 for both the bound file and each 
- Improvement in memory and thread management - to reduce 
memory consumption when compressing very large files (100s of 
GB to TBs)
- Separate --help text for each command
- Optimize MD5 performance (move to 32b and eliminate memory 
- Many bug fixes.