Background: Genetic testing can determine family screening strategies and has prognostic and diagnostic value in hypertrophic cardiomyopathy (HCM). However, it can also pose a significant psychosocial burden. Conventional scoring systems offer modest ability to predict genotype positivity. The aim of our study was to develop a novel prediction model for genotype positivity in patients with HCM by applying machine learning (ML) algorithms.
Methods: We constructed 3 ML models using readily available clinical and cardiac imaging data of 102 patients from Columbia University with HCM who had undergone genetic testing (the training set). We validated model performance on 76 patients with HCM from Massachusetts General Hospital (the test set). Within the test set, we compared the area under the receiver operating characteristic curves (AUROCs) for the ML models against the AUROCs generated by the Toronto HCM Genotype Score (the Toronto score) and Mayo HCM Genotype Predictor (the Mayo score) using the Delong test and net reclassification improvement.
Results: Overall, 63 of the 178 patients (35%) were genotype positive. The random forest ML model developed in the training set demonstrated an AUROC of 0.92 (95% CI, 0.85-0.99) in predicting genotype positivity in the test set, significantly outperforming the Toronto score (AUROC, 0.77 [95% CI, 0.65-0.90], P=0.004, net reclassification improvement: P<0.001) and the Mayo score (AUROC, 0.79 [95% CI, 0.67-0.92], P=0.01, net reclassification improvement: P=0.001). The gradient boosted decision tree ML model also achieved significant net reclassification improvement over the Toronto score (P<0.001) and the Mayo score (P=0.03), with an AUROC of 0.87 (95% CI, 0.75-0.99). Compared with the Toronto and Mayo scores, all 3 ML models had higher sensitivity, positive predictive value, and negative predictive value.
Conclusions: Our ML models demonstrated a superior ability to predict genotype positivity in patients with HCM compared with conventional scoring systems in an external validation test set.
Keywords: cardiomyopathies; genes; genotype; machine learning; mutation.