Depth-Based Intervention Detection in the Neonatal Intensive Care Unit Using Vision Transformers

Sensors (Basel). 2024 Dec 4;24(23):7753. doi: 10.3390/s24237753.

Abstract

Depth cameras can provide an effective, noncontact, and privacy-preserving means to monitor patients in the Neonatal Intensive Care Unit (NICU). Clinical interventions and routine care events can disrupt video-based patient monitoring. Automatically detecting these periods can decrease the time required for hand-annotating recordings, which is needed for system development. Moreover, the automatic detection can be used in the future for real-time or retrospective intervention event classification. An intervention detection method based solely on depth data was developed using a vision transformer (ViT) model utilizing real-world data from patients in the NICU. Multiple design parameters were investigated, including encoding of depth data and perspective transform to account for nonoptimal camera placement. The best-performing model utilized ∼85 M trainable parameters, leveraged both perspective transform and HHA (Horizontal disparity, Height above ground, and Angle with gravity) encoding, and achieved a sensitivity of 85.6%, a precision of 89.8%, and an F1-Score of 87.6%.

Keywords: NICU; ViT; depth camera; intervention detection; neonatal patient monitoring; transformer; vision transformer.

MeSH terms

  • Algorithms
  • Humans
  • Infant, Newborn
  • Intensive Care Units, Neonatal*
  • Monitoring, Physiologic / methods
  • Video Recording / methods