Converting a 23andMe Raw Genetic File to VCF

Data Types: 23andMe

23andMe customers can download their raw genetic data, following these instructions.

However, this data comes in a propietary 23andMe format.

The 23andMe file is called something like (the exact file name format may vary).

Here, we explain how to convert the file to the standard VCF format.

Step 1: Download a reference file - any version of hg19 or GRCh37 will do, for example this one: hs37d5.fa.gz. This file is quite large: appoximately 900MB.

Step 2: Create a Genozip reference file: This takes about 10 minutes to run:

genozip --make-reference hs37d5.fa.gz

Step 3: Compress your 23andMe file with Genozip:


Step 4: Convert the file to VCF:

genocat --reference hs37d5.ref.genozip --vcf genome_John_Doe_v3_Full_20190101201010.genozip --output mydata.vcf

Note: Indel variants (‘DD’ ‘DI’ ‘II’) as well as uncalled sites (’–’) are discarded