Background: Cardiac channelopathies such as catecholaminergic polymorphic tachycardia and long QT syndrome predispose patients to fatal arrhythmias and sudden cardiac death. As genetic testing has become common in clinical practice, variants of uncertain significance (VUS) in genes associated with catecholaminergic polymorphic ventricular tachycardia and long QT syndrome are frequently found. The objective of this study was to predict pathogenicity of catecholaminergic polymorphic ventricular tachycardia-associated RYR2 VUS and long QT syndrome-associated VUS in KCNQ1, KCNH2, and SCN5A by developing gene-specific machine learning models and assessing them using cross-validation, cellular electrophysiological data, and clinical correlation.
Methods: The GENe-specific EnSemble grId Search framework was developed to identify high-performing machine learning models for RYR2, KCNQ1, KCNH2, and SCN5A using variant- and protein-specific inputs. Final models were applied to datasets of VUS identified from ClinVar and exome sequencing. Whole cell patch clamp and clinical correlation of selected VUS was performed.
Results: The GENe-specific EnSemble grId Search models outperformed alternative methods, with area under the receiver operating characteristics up to 0.87, average precisions up to 0.83, and calibration slopes as close to 1.0 (perfect) as 1.04. Blinded voltage-clamp analysis of HEK293T cells expressing 2 predicted pathogenic variants in KCNQ1 each revealed an ≈80% reduction of peak Kv7.1 current compared with WT. Normal Kv7.1 function was observed in KCNQ1-V241I HEK cells as predicted. Though predicted benign, loss of Kv7.1 function was observed for KCNQ1-V106D HEK cells. Clinical correlation of 9/10 variants supported model predictions.
Conclusions: Gene-specific machine learning models may have a role in post-genetic testing diagnostic analyses by providing high performance prediction of variant pathogenicity.
Keywords: channelopathies; electrophysiology; logistic models; machine learning; neural networks.