Accurate determination of the sample's chronological age is an important forensic problem. This regression problem may be improved by selecting appropriate methylomic features. Most of the existing feature selection algorithms, however, optimize the regression performance by considering only the original features. This study proposed four feature engineering strategies to transform the original methylomic features. The regression performance of the age regression model was improved by the resampling-based feature selection algorithm FeSTwo proposed in this study. FeSTwo outperformed the parallel algorithms used in the previous studies even with the electronic health record data. The age prediction performance of the FeSTwo-detected features was also confirmed for another independent dataset. The study results demonstrated that the proposed model, FeSTwo, led to a more than 8% reduction in root-mean-square error (RMSE) on the test dataset with only 70 features.
Keywords: Age prediction; FeSTwo; Feature engineering; Feature selection; Linear regression; Methylomic biomarker.
Copyright © 2020 Elsevier Ltd. All rights reserved.