Detecting tandem repeat variants in coding regions using code-adVNTR

Park J, Bakhtiari M, Popp B, Wiesener M, Bafna V (2022)

Publication Type: Journal article

Publication year: 2022


Book Volume: 25

Article Number: 104785

Journal Issue: 8

DOI: 10.1016/j.isci.2022.104785


The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call MUC1 mutations for MCKD1.

Authors with CRIS profile

Involved external institutions

How to cite


Park, J., Bakhtiari, M., Popp, B., Wiesener, M., & Bafna, V. (2022). Detecting tandem repeat variants in coding regions using code-adVNTR. iScience, 25(8).


Park, Jonghun, et al. "Detecting tandem repeat variants in coding regions using code-adVNTR." iScience 25.8 (2022).

BibTeX: Download