Hybrid statistical and machine-learning approach to hearing-loss identification based on an oversampling technique

Comput Biol Med. 2024 Dec 12:185:109539. doi: 10.1016/j.compbiomed.2024.109539. Online ahead of print.

Abstract

Background and objectives: Hearing loss is a crucial global health hazard exerting considerable social and physiological effects on spoken language and cognition. Patients affected by this condition may experience social and professional hardships that dominate occupational injuries. Therefore, the identification of the features of recessive hearing loss is important for clinicians to prevent further disease progression. This work aimed to develop a hybrid statistical and machine-learning approach as a decision-support mechanism. We expect the proposed model to help predict hearing-loss disorders and support clinical diagnosis.

Methods: A three-phase hybrid approach was proposed to implement classification models. A stepwise method and a random forest (RF) technique were utilized as filters during feature selection. Phase I involved reducing the number of input variables and selecting the most influential features. Phase II included the use of an oversampling technique called synthetic minority oversampling technique (SMOTE) to oversample the minority class and balance the sample size between the target and nontarget classes. Phase III focused on the final model selection based on three supervised classification models, namely, the logistic regression, multilayer perceptron, and support vector machine (SVM), for the target identification and prediction of the case of interest (i.e., hearing loss).

Results: The analysis of phase I involved the selection and acquisition of three and seven features through the stepwise technique and RF method, respectively. The SMOTE technique alleviated the imbalanced data issue and improved the predictive capability substantially in phase II and III. Accordingly, in terms of accuracy, precision, recall, and F1 score, our empirical results demonstrated that the proposed hybrid approach involving the SVM method combined with a stepwise technique was competitive against the logistic model featuring all variables. Furthermore, the SVM models that cooperated with the stepwise and RF technique showed superiority to other approaches in terms of the area under the curve (AUC).

Conclusion: Compared with multivariate models, the hybrid approach combining the SVM method coupled with a stepwise technique and/or an RF technique is an excellent alternative with a higher efficiency. This approach requires fewer predictors in the model and can be competitive in terms of the accuracy, precision, recall, F1 score, and AUC. This work highlights the potential of hybrid statistical and machine-learning approaches. Our model can be used as a screening tool for upfront forecasting in clinical practice. The proposed hybrid approach also demonstrates a powerful capability to identify vital features and predict hearing loss.

Keywords: Feature selection; Hearing loss; Logistic regression; Multilayer perceptron; Support vector machine; Synthetic minority oversampling technique.