Renaming and dropping annotations in a DVCF

See also:


At a glance

In some cases, an annotation’s name (rather than value) changes between the Primary and Luft renditions. This happens in case of a REF⇆ALT switch for annotations with a name that contains a reference to the REF or ALT allele (for example: ALT_F1R2), and in case of a strand reversal where the annotation name contains a reference to the strand, for example ADF. In some other cases, the annotation makes no sense in the Luft coordinates, and should be dropped entirely. Genozip implements annotation dropping by adding a “DROP_” prefix to their name.

Genozip default annotation renaming and dropping

These are the annotations that are renamed by default:

Annotation

Type

Renamed to

Upon

MAX_AF

INFO

DROP_MAX_AF

REF⇆ALT switch

CLNHGVS

INFO

DROP_CLNHGVS

Always

ADF

FORMAT

ADR

Strand reversal

ADR

FORMAT

ADF

Strand reversal

RDF

FORMAT

RDR

Strand reversal

RDR

FORMAT

RDF

Strand reversal

F1R1

FORMAT

F2R1

Strand reversal

F2R1

FORMAT

F1R2

Strand reversal

REF_F1R2

FORMAT

REF_F2R1 ALT_F1R2 ALT_F2R1

Strand reversal REF⇆ALT switch REF⇆ALT + Strand

ALT_F1R2

FORMAT

ALT_F2R1 REF_F1R2 REF_F2R1

Strand reversal REF⇆ALT switch REF⇆ALT + Strand

REF_F2R1

FORMAT

REF_F1R2 ALT_F2R1 ALT_F1R2

Strand reversal REF⇆ALT switch REF⇆ALT + Strand

ALT_F2R1

FORMAT

ALT_F1R2 REF_F2R1 REF_F1R2

Strand reversal REF⇆ALT switch REF⇆ALT + Strand

The –dvcf-rename option

Annotations may be renamed by specifying the --dvcf-rename command line option, together with --chain, for example:

genozip myfile.vcf --chain mychain.chain.genozip --dvcf-rename=FORMAT/ADF:STRAND>ADR|REFALT>DROP_ADF

The argument is a comma-separated list of all annotations that need to be renamed (this example contains only one annotation - FORMAT/ADF):

  • The annotation name (FORMAT/ADF in this case) can be the name only (eg ADF), or prefixed with INFO/ or FORMAT/ to resolve ambiguity.

  • The rules for renaming the particular annotation are specified as a | (pipe)-seperated list to the right of the colon. In the example above, we have two rules: STRAND>ADR and REFALT>DROP_ADF.

  • Each rule consists of an event and a destination annotation name, seperated by a > (greater-than) character. The event can be one of the four:

Rule

Rule activated upon

STRAND

Strand reversal

REFALT

REF⇆ALT switch

TLAFER

Concurrent strand reversal and REF⇆ALT switch

ALWAYS

Always

The –dvcf-drop option

Annotations may be dropped with the --dvcf-drop command line option, for example:

genozip myfile.vcf --chain mychain.chain.genozip --dvcf-drop=INFO/MAX_AF:REFALT

This is equivalent of:

--dvcf-rename=INFO/MAX_AF:REFALT>DROP_MAX_AF

To override Genozip’s default renaming, just rename the tag to itself, for example:

--dvcf-rename=INFO/MAX_AF:ALWAYS>MAX_AF

The –show-rename-tags option

The --show-rename-tags can be used in combination with --chain or when compressing a DVCF file, to display the list of annotations that are to be renamed.