Correcting for Observation Bias in Cancer Progression Modeling

J Comput Biol. 2024 Oct;31(10):927-945. doi: 10.1089/cmb.2024.0666.

Abstract

Tumor progression is driven by the accumulation of genetic alterations, including both point mutations and copy number changes. Understanding the temporal sequence of these events is crucial for comprehending the disease but is not directly discernible from cross-sectional genomic data. Cancer progression models, including Mutual Hazard Networks (MHNs), aim to reconstruct the dynamics of tumor progression by learning the causal interactions between genetic events based on their co-occurrence patterns in cross-sectional data. Here, we highlight a commonly overlooked bias in cross-sectional datasets that can distort progression modeling. Tumors become clinically detectable when they cause symptoms or are identified through imaging or tests. Detection factors, such as size, inflammation (fever, fatigue), and elevated biochemical markers, are influenced by genomic alterations. Ignoring these effects leads to "conditioning on a collider" bias, where events making the tumor more observable appear anticorrelated, creating false suppressive effects or masking promoting effects among genetic events. We enhance MHNs by incorporating the effects of genetic progression events on the inclusion of a tumor in a dataset, thus correcting for collider bias. We derive an efficient tensor formula for the likelihood function and apply it to two datasets from the MSK-IMPACT study. In colon adenocarcinoma, we observe a significantly higher rate of clinical detection for TP53-positive tumors, while in lung adenocarcinoma, the same is true for EGFR-positive tumors. Compared to classical MHNs, this approach eliminates several spurious suppressive interactions and uncovers multiple promoting effects.

Keywords: cancer progression model; collider bias; selection bias.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bias
  • Computational Biology / methods
  • Cross-Sectional Studies
  • Disease Progression*
  • Humans
  • Likelihood Functions
  • Lung Neoplasms / genetics
  • Lung Neoplasms / pathology
  • Neoplasms* / genetics
  • Neoplasms* / pathology
  • Tumor Suppressor Protein p53 / genetics

Substances

  • Tumor Suppressor Protein p53