Background: 5q-Spinal muscular atrophy (SMA) is now one of the 5% treatable rare diseases worldwide. As disease-modifying therapies alter disease progression and patient phenotypes, paediatricians and consulting disciplines face new unknowns in their treatment decisions. Conclusions made from historical patient data sets are now mostly limited, and new approaches are needed to ensure our continued best standard-of-care practices for this exceptional patient group. Here, we present a data-driven machine learning approach to a rare disease data set to predict spinal muscular atrophy (SMA)-associated scoliosis.
Methods: We collected data from 84 genetically confirmed 5q-SMA patients who have received novel SMA therapies. We performed expert domain knowledge-directed feature engineering, correlation and predictive power score (PPS) analyses for feature selection. To test the predictive performance of the selected features, we trained a Random Forest Classifier and evaluated model performance using standard metrics.
Results: The SMA data set consisted of 1304 visits and over 360 variables. We performed feature engineering for variables related to 'interventions', 'devices', 'orthosis', 'ventilation', 'muscle contractures' and 'motor milestones'. Through correlation and PPS analysis paired with expert domain knowledge feature selection, we identified relevant features for scoliosis prediction in SMA that included disease progression markers: Hammersmith Functional Motor Scale Expanded 'HFMSE' (PPS = 0.27) and 6-Minute Walk Test '6MWT' scores (PPS = 0.44), 'age' (PPS = 0.41) and 'weight' (PPS = 0.49), 'contractures' (PPS = 0.17), the use of 'assistive devices' (PPS = 0.39, 'ventilation' (PPS = 0.16) and the presence of 'gastric tubes' (PPS = 0.35) in SMA patients. These features were validated using expert domain knowledge and used to train a Random Forest Classifier with an observed accuracy of 0.82 and an average receiver operating characteristic (ROC) area of 0.87.
Conclusion: The introduction of disease-modifying SMA therapies, followed by the implementation of SMA in newborn screenings, has presented physicians with never-seen patients. We used feature engineering tools to overcome one of the main challenges when using data-driven approaches in rare disease data sets. Through predictive modelling of this data, we defined disease progression markers, which are easily assessed during patient visits and can help anticipate scoliosis onset. This highlights the importance of progressive features in the drug-induced revolution of this rare disease and further supports the ongoing efforts to update the SMA classification. We advocate for the consistent documentation of relevant progression markers, which will serve as a basis for data-driven models that physicians can use to update their best standard-of-care practices.
Keywords: feature engineering; gene therapy; machine learning; predictive power score; rare disease; spinal muscular atrophy.
© 2024 The Author(s). Journal of Cachexia, Sarcopenia and Muscle published by Wiley Periodicals LLC.