A Reinforcement Learning Model for Optimal Treatment Strategies in Intensive Care: Assessment of the Role of Cardiorespiratory Features

IEEE Open J Eng Med Biol. 2024 Feb 19:5:806-815. doi: 10.1109/OJEMB.2024.3367236. eCollection 2024.

Abstract

Goal: The purpose of this study is to evaluate the importance of cardiorespiratory variables within a Reinforcement Learning (RL) recommendation system aimed at establishing optimal strategies for drug treatment of septic patients in the intensive care unit (ICU). Methods: We developed a RL model in order to establish drug administration strategies for septic patients using only a set of cardiorespiratory variables. We then compared this model with other RL models trained with a different set of features. We selected patients meeting the Sepsis-3 criteria from the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC III) database, resulting in a total of 20,496 ICU admissions. A Markov Decision Process (MDP) was built on the extracted discrete time-series. A policy iteration algorithm was used to obtain the optimal AI policy for the MDP. The policy performance was then evaluated using the WIS estimator. The process was repeated for each set of variables and compared to a set of baseline benchmark policies. Results: The model trained with cardiorespiratory variables outperformed all other models considered, resulting in a 95% confidence lower bound score of 97.48. This finding highlights the importance of cardiovascular variables in the clinical RL recommendation system. Conclusions: We established an efficient RL model for sepsis treatment in the ICU and demonstrated that cardiorespiratory variables provides critical information in devising optimal policies. Given the potentially continuous availability of cardiorespiratory features extracted from bedside physiological waveform monitoring, the proposed framework paves the way for a real time recommendation system for sepsis treatment.

Keywords: Cardiovascular system; dimensionality reduction; intensive care unit; reinforcement learning; sepsis.

Grants and funding

The work of Li-wei H. Lehman was supported by NIH under Grant R01 EB030362 and Grant R01 EB017205.