A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud

Zeyu Wang; Xiaofang Chen; Yiwei Wu; Linke Jiang; Shiming Lin; Gang Qiu

doi:10.1038/s41598-024-82062-x

A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud

Sci Rep. 2025 Jan 2;15(1):218. doi: 10.1038/s41598-024-82062-x.

Authors

Zeyu Wang¹, Xiaofang Chen², Yiwei Wu¹, Linke Jiang¹, Shiming Lin^{3

4}, Gang Qiu⁵

Affiliations

¹ School of Informatics, Xiamen University, Xiamen, 361005, Fujian, China.
² Xiang'an Hospital, Xiamen University, Xiamen, 361101, Fujian, China.
³ School of Informatics, Xiamen University, Xiamen, 361005, Fujian, China. [email protected].
⁴ School of Information Engineering, Changji University, Changji, 831100, Xinjiang, China. [email protected].
⁵ School of Information Engineering, Changji University, Changji, 831100, Xinjiang, China. [email protected].

Abstract

Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preprocessing, model training and integration, and result analysis with feature interpretation. Initially, we examined the dataset's characteristics and employed embedded and permutation methods to test the performance and runtime of single models under different feature sets, selecting the minimal number of features that could still achieve high performance. We then applied ensemble techniques, including Voting, Weighted, and Stacking methods, to combine different models and compare their performances. Feature interpretation was achieved through partial dependence plots (PDP), SHAP, and LIME, allowing us to understand each feature's impact on the predictions. Finally, we benchmarked our approach against existing studies to evaluate its advantages and limitations. The findings demonstrate improved fraud detection accuracy and offer insights into the interpretability of machine learning models in this context.

Keywords: Healthcare insurance fraud; Machine learning; Model ensemble; Model interpretability.

MeSH terms

Fraud*
Humans
Insurance, Health* / economics
Machine Learning*

Abstract

MeSH terms

Grants and funding