A foundational transformer leveraging full night, multichannel sleep study data accurately classifies sleep stages

Benjamin Fox; Joy Jiang; Sajila Wickramaratne; Patricia Kovatch; Mayte Suarez-Farinas; Neomi A Shah; Ankit Parekh; Girish N Nadkarni

doi:10.1101/2024.08.02.24311417

A foundational transformer leveraging full night, multichannel sleep study data accurately classifies sleep stages

medRxiv [Preprint]. 2024 Aug 5:2024.08.02.24311417. doi: 10.1101/2024.08.02.24311417.

Authors

Benjamin Fox¹, Joy Jiang¹, Sajila Wickramaratne², Patricia Kovatch³, Mayte Suarez-Farinas⁴, Neomi A Shah², Ankit Parekh², Girish N Nadkarni^{1

5}

Affiliations

¹ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
³ Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Division of Digital and Data Driven Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Abstract

Study objectives: To investigate whether a foundational transformer model using 8-hour, multichannel data from polysomnograms can outperform existing artificial intelligence (AI) methods for sleep stage classification.

Methods: We utilized the Sleep Heart Health Study (SHHS) visits 1 and 2 for training and validation and the Multi-Ethnic Study of Atherosclerosis (MESA) for testing of our model. We trained a self-supervised foundational transformer (called PFTSleep) that encodes 8-hour long sleep studies at 125 Hz with 7 signals including brain, movement, cardiac, oxygen, and respiratory channels. These encodings are used as input for training of an additional model to classify sleep stages, without adjusting the weights of the foundational transformer. We compared our results to existing AI methods that did not utilize 8-hour data or the full set of signals but did report evaluation metrics for the SHHS dataset.

Results: We trained and validated a model with 8,444 sleep studies with 7 signals including brain, movement, cardiac, oxygen, and respiratory channels and tested on an additional 2,055 studies. In total, we trained and tested 587,944 hours of sleep study signal data. Area under the precision recall curve (AUPRC) scores were 0.82, 0.40, 0.53, 0.75, and 0.82 and area under the receiving operating characteristics curve (AUROC) scores were 0.99, 0.95, 0.96, 0.98, and 0.99 for wake, N1, N2, N3, and REM, respectively, on the SHHS validation set. For MESA, the AUPRC scores were 0.56, 0.16, 0.40, 0.45, and 0.65 and AUROC scores were 0.94, 0.77, 0.87, 0.91, and 0.96, respectively. Our model was compared to the longest context window state-of-the-art model and showed increases in macro evaluation scores, notably sensitivity (3.7% increase) and multi-class REM (3.39% increase) and wake (0.97% increase) F1 scores.

Conclusions: Utilizing full night, multi-channel PSG data encodings derived from a foundational transformer improve sleep stage classification over existing methods.

Publication types

Preprint

Abstract

Publication types

Grants and funding