Considerations to address missing data when deriving clinical trial endpoints from digital health technologies

Contemp Clin Trials. 2022 Feb:113:106661. doi: 10.1016/j.cct.2021.106661. Epub 2021 Dec 22.

Abstract

Digital health technologies (DHTs) enable us to measure human physiology and behavior remotely, objectively and continuously. With the accelerated adoption of DHTs in clinical trials, there is an unmet need to identify statistical approaches to address missing data to ensure that the derived endpoints are valid, accurate, and reliable. It is not obvious how commonly used statistical methods to handle missing data in clinical trials can be directly applied to the complex data collected by DHTs. Meanwhile, current approaches used to address missing data from DHTs are of limited sophistication and focus on the exclusion of data where the quantity of missing data exceeds a given threshold. High-frequency time series data collected by DHTs are often summarized to derive epoch-level data, which are then processed to compute daily summary measures. In this article, we discuss characteristics of missing data collected by DHT, review emerging statistical approaches for addressing missingness in epoch-level data including within-patient imputations across common time periods, functional data analysis, and deep learning methods, as well as imputation approaches and robust modeling appropriate for handling missing data in daily summary measures. We discuss strategies for minimizing missing data by optimizing DHT deployment and by including the patients' perspectives in the study design. We believe that these approaches provide more insight into preventing missing data when deriving digital endpoints. We hope this article can serve as a starting point for further discussion among clinical trial stakeholders.

Keywords: DHT deployment; Digital health technologies; Missing data; Statistical methods for missing data.

MeSH terms

  • Humans
  • Research Design*