Deep Learning for Obstructive Sleep Apnea Detection and Severity Assessment: A Multimodal Signals Fusion Multiscale Transformer Model

Yitong Zhang; Liang Zhou; Simin Zhu; Yanuo Zhou; Zitong Wang; Lina Ma; Yuqi Yuan; Yushan Xie; Xiaoxin Niu; Yonglong Su; Haiqin Liu; Xinhong Hei; Zhenghao Shi; Xiaoyong Ren; Yewen Shi

doi:10.2147/NSS.S492806

Deep Learning for Obstructive Sleep Apnea Detection and Severity Assessment: A Multimodal Signals Fusion Multiscale Transformer Model

Nat Sci Sleep. 2025 Jan 6:17:1-15. doi: 10.2147/NSS.S492806. eCollection 2025.

Authors

Yitong Zhang¹, Liang Zhou², Simin Zhu¹, Yanuo Zhou¹, Zitong Wang¹, Lina Ma¹, Yuqi Yuan¹, Yushan Xie¹, Xiaoxin Niu¹, Yonglong Su¹, Haiqin Liu¹, Xinhong Hei², Zhenghao Shi², Xiaoyong Ren¹, Yewen Shi¹

Affiliations

¹ Department of Otorhinolaryngology Head and Neck Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi Province, People's Republic of China.
² School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi Province, People's Republic of China.

Abstract

Purpose: To develop a deep learning (DL) model for obstructive sleep apnea (OSA) detection and severity assessment and provide a new approach for convenient, economical, and accurate disease detection.

Methods: Considering medical reliability and acquisition simplicity, we used electrocardiogram (ECG) and oxygen saturation (SpO₂) signals to develop a multimodal signal fusion multiscale Transformer model for OSA detection and severity assessment. The proposed model comprises signal preprocessing, feature extraction, cross-modal interaction, and classification modules. A total of 510 patients who underwent polysomnography were included in the hospital dataset. The model was tested on hospital and public datasets. The hospital dataset was utilized to demonstrate the applicability and generalizability of the model. Two public datasets, Apnea-ECG dataset (consisting of 8 recordings) and UCD dataset (consisting of 21 recordings), were used to compare the results with those of previous studies.

Results: In the hospital dataset, the accuracy (Acc) values of per-segment and per-recording detection were 91.38 and 96.08%, respectively. The Acc values for mild, moderate, and severe OSA were 90.20, 88.24, and 92.16%, respectively. The Bland‒Altman plots revealed the consistency of the true apnea-hypopnea index (AHI) and the predicted AHI. In the public datasets, the per-segment detection Acc values of the Apnea-ECG and UCD datasets were 95.04 and 90.56%, respectively.

Conclusion: The experiments on hospital and public datasets have demonstrated that the proposed model is more advanced, accurate, and applicable in OSA detection and severity assessment than previous models.

Keywords: deep learning; detection model; multimodal signals fusion; obstructive sleep apnea.

Grants and funding

This work was supported by the National Natural Science Foundation of China (62076198), National Natural Science Foundation of China (82371129) and the Free Exploration and Innovation Project of the Basic Scientific Research Fund of Xi ‘an Jiaotong University (xzy012023119). The funding bodies played no role in the design of the study and collection, analysis, interpretation of data, and in writing the manuscript.