Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies

Kun Fan; Srijana Subedi; Gongshun Yang; Xi Lu; Jie Ren; Cen Wu

doi:10.3390/e26090794

Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies

Entropy (Basel). 2024 Sep 16;26(9):794. doi: 10.3390/e26090794.

Authors

Kun Fan¹, Srijana Subedi¹, Gongshun Yang¹, Xi Lu², Jie Ren³, Cen Wu¹

Affiliations

¹ Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
² Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX 77204, USA.
³ Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Abstract

Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, p values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the "unpenalized loss function + penalty term" formulation for regularization methods and the "likelihood function × shrinkage prior" framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.

Keywords: exact sparsity; frequentist and Bayesian variable selection; regularized variable selection; robust Bayesian inference; uncertainty quantification.

Publication types

Review

Grants and funding

This work was partially supported by an Innovative Research Award from the Johnson Cancer Research Center at Kansas State University.