A Machine Learning Model to Aid Detection of Familial Hypercholesterolemia

Jasmine Gratton; Marta Futema; Steve E Humphries; Aroon D Hingorani; Chris Finan; Amand F Schmidt

doi:10.1016/j.jacadv.2023.100333

A Machine Learning Model to Aid Detection of Familial Hypercholesterolemia

JACC Adv. 2023 May 24;2(4):100333. doi: 10.1016/j.jacadv.2023.100333. eCollection 2023 Jun.

Authors

Jasmine Gratton¹, Marta Futema^{1

2}, Steve E Humphries¹, Aroon D Hingorani^{1

3

4}, Chris Finan^{1

3

5}, Amand F Schmidt^{1

3

5}

Affiliations

¹ Institute of Cardiovascular Science, University College London, London, United Kingdom.
² Cardiology Research Centre, Molecular and Clinical Sciences Research Institute, St George's University of London, London, United Kingdom.
³ UCL British Heart Foundation Research Accelerator.
⁴ Health Data Research UK, London, United Kingdom.
⁵ Division Heart and Lungs, Department of Cardiology, University Medical Centre Utrecht, Utrecht University, Utrecht, the Netherlands.

Abstract

Background: People with monogenic familial hypercholesterolemia (FH) are at an increased risk of premature coronary heart disease and death. With a prevalence of 1:250, FH is relatively common; but currently there is no population screening strategy in place and most carriers are identified late in life, delaying timely and cost-effective interventions.

Objectives: The purpose of this study was to derive an algorithm to identify people with suspected monogenic FH for subsequent confirmatory genomic testing and cascade screening.

Methods: A least absolute shrinkage and selection operator logistic regression model was used to identify predictors that accurately identified people with FH in 139,779 unrelated participants of the UK Biobank. Candidate predictors included information on medical and family history, anthropometric measures, blood biomarkers, and a low-density lipoprotein cholesterol (LDL-C) polygenic score (PGS). Model derivation and evaluation were performed in independent training and testing data.

Results: A total of 488 FH variant carriers were identified using whole-exome sequencing of the low-density lipoprotein receptor, apolipoprotein B, apolipoprotein E, proprotein convertase subtilisin/kexin type 9 genes. A 14-variable algorithm for FH was derived, with an area under the curve of 0.77 (95% CI: 0.71-0.83), where the top 5 most important variables included triglyceride, LDL-C, apolipoprotein A1 concentrations, self-reported statin use, and LDL-C PGS. Excluding the PGS as a candidate feature resulted in a 9-variable model with a comparable area under the curve: 0.76 (95% CI: 0.71-0.82). Both multivariable models (w/wo the PGS) outperformed screening-prioritization based on LDL-C adjusted for statin use.

Conclusions: Detecting individuals with FH can be improved by considering additional predictors. This would reduce the sequencing burden in a 2-stage population screening strategy for FH.

Keywords: FH; UK Biobank; polygenic score; prediction; screening.

Grants and funding

WT_/Wellcome Trust/United Kingdom