Studying missingness in spinal cord injury data: challenges and impact of data imputation

BMC Med Res Methodol. 2024 Jan 6;24(1):5. doi: 10.1186/s12874-023-02125-x.

Abstract

Background: In the last decades, medical research fields studying rare conditions such as spinal cord injury (SCI) have made extensive efforts to collect large-scale data. However, most analysis methods rely on complete data. This is particularly troublesome when studying clinical data as they are prone to missingness. Often, researchers mitigate this problem by removing patients with missing data from the analyses. Less commonly, imputation methods to infer likely values are applied.

Objective: Our objective was to study how handling missing data influences the results reported, taking the example of SCI registries. We aimed to raise awareness on the effects of missing data and provide guidelines to be applied for future research projects, in SCI research and beyond.

Methods: Using the Sygen clinical trial data (n = 797), we analyzed the impact of the type of variable in which data is missing, the pattern according to which data is missing, and the imputation strategy (e.g. mean imputation, last observation carried forward, multiple imputation).

Results: Our simulations show that mean imputation may lead to results strongly deviating from the underlying expected results. For repeated measures missing at late stages (> = 6 months after injury in this simulation study), carrying the last observation forward seems the preferable option for the imputation. This simulation study could show that a one-size-fit-all imputation strategy falls short in SCI data sets.

Conclusions: Data-tailored imputation strategies are required (e.g., characterisation of the missingness pattern, last observation carried forward for repeated measures evolving to a plateau over time). Therefore, systematically reporting the extent, kind and decisions made regarding missing data will be essential to improve the interpretation, transparency, and reproducibility of the research presented.

Keywords: Imputation; Missing data; Simulation study; Spinal cord injury.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research*
  • Computer Simulation
  • Humans
  • Rare Diseases
  • Reproducibility of Results
  • Spinal Cord Injuries* / epidemiology
  • Spinal Cord Injuries* / therapy