I form part of the team working on the TargetID research project at the University of Malta, which is using RNA-seq and whole genome sequencing to find suitable drug targets that will prevent COVID-19 induced cytokine storms. Our pipelines were generating data at higher rates than our initial estimate, and we needed to free up some storage. Standard file compressors weren’t cutting it, and while researching file-specific compressors I encountered ‘Genozip’. At first I was sceptical of its claimed compression ratios and ability to compress CRAM files, but I was able to reproduce these results on our own files. We have now incorporated genozip into our pipelines which has reduced our initial estimate of 205TB to a new estimate of 55TB. I was pleasantly surprised with Genozip and I recommend it to those encountering similar storage space issues.
Dramatic decreases in sequencing costs and advances in analysis techniques have significantly increased the amount of data obtained by NGS. However, the cost of the data storage has not decreased in the same way, which is a major problem in sequence analysis. I am currently working as a member of a team at Kyoto University on single-cell and whole-genome analysis projects, and we have been struggling with the rapidly increasing amount of sequence data. Our collaborator introduced Genozip to us, and we carefully verified its performance. We were very impressed by the speed, high compression ratio, high versatility, ease of implementation, and quick support, and are now using Genozip to compress all our Bam and Fastq files. The speed of compression and decompression is very important for daily use, and although we used to compress files only for long-term storage, we are now compressing files for short-term storage as well, which has resulted in a great reduction in the amount of data used.
I came across Genozip while googling to find a recent program that would manage more efficiently very large archives of genomic data. I decided to give it a try and encountered a bug that was fixed immediately and the new version released to conda just hours later. The program works nicely and is really impressive for the space it saves compared to former archives as well as for all the features implemented to handle swiftly created archives. Genozip is now installed on all the clusters and servers I use. Divon is very active to answer questions and fix bugs. I really recommend Genozip.