RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci

Genome Biol. 2024 Jan 31;25(1):39. doi: 10.1186/s13059-024-03171-4.

Abstract

Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.

Keywords: Machine learning; Rare diseases; Repeat expansions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Tandem Repeat Sequences*
  • Virulence