Supervised machine learning to predict smoking lapses from Ecological Momentary Assessments and sensor data: Implications for just-in-time adaptive intervention development

PLOS Digit Health. 2024 Aug 23;3(8):e0000594. doi: 10.1371/journal.pdig.0000594. eCollection 2024 Aug.

Abstract

Specific moments of lapse among smokers attempting to quit often lead to full relapse, which highlights a need for interventions that target lapses before they might occur, such as just-in-time adaptive interventions (JITAIs). To inform the decision points and tailoring variables of a lapse prevention JITAI, we trained and tested supervised machine learning algorithms that use Ecological Momentary Assessments (EMAs) and wearable sensor data of potential lapse triggers and lapse incidence. We aimed to identify a best-performing and feasible algorithm to take forwards in a JITAI. For 10 days, adult smokers attempting to quit were asked to complete 16 hourly EMAs/day assessing cravings, mood, activity, social context, physical context, and lapse incidence, and to wear a Fitbit Charge 4 during waking hours to passively collect data on steps and heart rate. A series of group-level supervised machine learning algorithms (e.g., Random Forest, XGBoost) were trained and tested, without and with the sensor data. Their ability to predict lapses for out-of-sample (i) observations and (ii) individuals were evaluated. Next, a series of individual-level and hybrid (i.e., group- and individual-level) algorithms were trained and tested. Participants (N = 38) responded to 6,124 EMAs (with 6.9% of responses reporting a lapse). Without sensor data, the best-performing group-level algorithm had an area under the receiver operating characteristic curve (AUC) of 0.899 (95% CI = 0.871-0.928). Its ability to classify lapses for out-of-sample individuals ranged from poor to excellent (AUCper person = 0.524-0.994; median AUC = 0.639). 15/38 participants had adequate data for individual-level algorithms to be constructed, with a median AUC of 0.855 (range: 0.451-1.000). Hybrid algorithms could be constructed for 25/38 participants, with a median AUC of 0.692 (range: 0.523 to 0.998). With sensor data, the best-performing group-level algorithm had an AUC of 0.952 (95% CI = 0.933-0.970). Its ability to classify lapses for out-of-sample individuals ranged from poor to excellent (AUCper person = 0.494-0.979; median AUC = 0.745). 11/30 participants had adequate data for individual-level algorithms to be constructed, with a median AUC of 0.983 (range: 0.549-1.000). Hybrid algorithms could be constructed for 20/30 participants, with a median AUC of 0.772 (range: 0.444 to 0.968). In conclusion, high-performing group-level lapse prediction algorithms without and with sensor data had variable performance when applied to out-of-sample individuals. Individual-level and hybrid algorithms could be constructed for a limited number of individuals but had improved performance, particularly when incorporating sensor data for participants with sufficient wear time. Feasibility constraints and the need to balance multiple success criteria in the JITAI development and implementation process are discussed.

Grants and funding

OP is supported by a Marie Skłodowska-Curie Postdoctoral Fellowship from the European Union (Grant Agreement number: 101065293). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them. DK, CL, and JB receive salary support from Cancer Research UK (PRCRPG-Nov21\100002). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.