Rare disease gene association discovery from burden analysis of the 100,000 Genomes Project data

medRxiv [Preprint]. 2023 Dec 21:2023.12.20.23300294. doi: 10.1101/2023.12.20.23300294.

Abstract

To discover rare disease-gene associations, we developed a gene burden analytical framework and applied it to rare, protein-coding variants from whole genome sequencing of 35,008 cases with rare diseases and their family members recruited to the 100,000 Genomes Project (100KGP). Following in silico triaging of the results, 88 novel associations were identified including 38 with existing experimental evidence. We have published the confirmation of one of these associations, hereditary ataxia with UCHL1 , and independent confirmatory evidence has recently been published for four more. We highlight a further seven compelling associations: hypertrophic cardiomyopathy with DYSF and SLC4A3 where both genes show high/specific heart expression and existing associations to skeletal dystrophies or short QT syndrome respectively; monogenic diabetes with UNC13A with a known role in the regulation of β cells and a mouse model with impaired glucose tolerance; epilepsy with KCNQ1 where a mouse model shows seizures and the existing long QT syndrome association may be linked; early onset Parkinson's disease with RYR1 with existing links to tremor pathophysiology and a mouse model with neurological phenotypes; anterior segment ocular abnormalities associated with POMK showing expression in corneal cells and with a zebrafish model with developmental ocular abnormalities; and cystic kidney disease with COL4A3 showing high renal expression and prior evidence for a digenic or modifying role in renal disease. Confirmation of all 88 associations would lead to potential diagnoses in 456 molecularly undiagnosed cases within the 100KGP, as well as other rare disease patients worldwide, highlighting the clinical impact of a large-scale statistical approach to rare disease gene discovery.

Publication types

  • Preprint