Technical and Biological Biases in Bulk Transcriptomic Data Mining for Cancer Research

J Cancer. 2025 Jan 1;16(1):34-43. doi: 10.7150/jca.100922. eCollection 2025.

Abstract

Cancer research has been significantly advanced by the integration of transcriptomic data through high-throughput sequencing technologies like RNA sequencing (RNA-seq). This paper reviews the transformative impact of transcriptomics on understanding cancer biology, focusing on the use of extensive datasets such as The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). While transcriptomic data provides crucial insights into gene expression patterns and disease mechanisms, the analysis is fraught with technical and biological biases. Technical biases include issues related to microarray, RNA-seq, and nanopore sequencing methods, while biological biases arise from factors like tumor heterogeneity and sample purity. Additionally, misinterpretations often occur when correlational data is erroneously assumed to imply causality or when bulk data is misattributed to specific cell types. This review emphasizes the need for researchers to understand and mitigate these biases to ensure accurate data interpretation and reliable clinical outcomes. By addressing these challenges, the paper aims to enhance the robustness of cancer research and improve the application of transcriptomic data in developing effective therapies and diagnostic tools.

Publication types

  • Review