Deep Learning for Obstructive Sleep Apnea Detection and Severity Assessment: A Multimodal Signals Fusion Multiscale Transformer Model

Nat Sci Sleep. 2025 Jan 6:17:1-15. doi: 10.2147/NSS.S492806. eCollection 2025.

Abstract

Purpose: To develop a deep learning (DL) model for obstructive sleep apnea (OSA) detection and severity assessment and provide a new approach for convenient, economical, and accurate disease detection.

Methods: Considering medical reliability and acquisition simplicity, we used electrocardiogram (ECG) and oxygen saturation (SpO2) signals to develop a multimodal signal fusion multiscale Transformer model for OSA detection and severity assessment. The proposed model comprises signal preprocessing, feature extraction, cross-modal interaction, and classification modules. A total of 510 patients who underwent polysomnography were included in the hospital dataset. The model was tested on hospital and public datasets. The hospital dataset was utilized to demonstrate the applicability and generalizability of the model. Two public datasets, Apnea-ECG dataset (consisting of 8 recordings) and UCD dataset (consisting of 21 recordings), were used to compare the results with those of previous studies.

Results: In the hospital dataset, the accuracy (Acc) values of per-segment and per-recording detection were 91.38 and 96.08%, respectively. The Acc values for mild, moderate, and severe OSA were 90.20, 88.24, and 92.16%, respectively. The Bland‒Altman plots revealed the consistency of the true apnea-hypopnea index (AHI) and the predicted AHI. In the public datasets, the per-segment detection Acc values of the Apnea-ECG and UCD datasets were 95.04 and 90.56%, respectively.

Conclusion: The experiments on hospital and public datasets have demonstrated that the proposed model is more advanced, accurate, and applicable in OSA detection and severity assessment than previous models.

Keywords: deep learning; detection model; multimodal signals fusion; obstructive sleep apnea.

Grants and funding

This work was supported by the National Natural Science Foundation of China (62076198), National Natural Science Foundation of China (82371129) and the Free Exploration and Innovation Project of the Basic Scientific Research Fund of Xi ‘an Jiaotong University (xzy012023119). The funding bodies played no role in the design of the study and collection, analysis, interpretation of data, and in writing the manuscript.