Structured feature ranking for genomic marker identification accommodating multiple types of networks

Biometrics. 2024 Oct 3;80(4):ujae158. doi: 10.1093/biomtc/ujae158.

Abstract

Numerous statistical methods have been developed to search for genomic markers associated with the development, progression, and response to treatment of complex diseases. Among them, feature ranking plays a vital role due to its intuitive formulation and computational efficiency. However, most of the existing methods are based on the marginal importance of molecular predictors and share the limitation that the dependence (network) structures among predictors are not well accommodated, where a disease phenotype usually reflects various biological processes that interact in a complex network. In this paper, we propose a structured feature ranking method for identifying genomic markers, where such network structures are effectively accommodated using Laplacian regularization. The proposed method innovatively investigates multiple network scenarios, where the networks can be known a priori and data-dependently estimated. In addition, we rigorously explore the noise and uncertainty in the networks and control their impacts with proper selection of tuning parameters. These characteristics make the proposed method enjoy especially broad applicability. Theoretical result of our proposal is rigorously established. Compared to the original marginal measure, the proposed network structured measure can achieve sure screening properties with a faster convergence rate under mild conditions. Extensive simulations and analysis of The Cancer Genome Atlas melanoma data demonstrate the improvement of finite sample performance and practical usefulness of the proposed method.

Keywords: graph Laplacian regularization; high dimensional data analysis; network structured analysis.

MeSH terms

  • Algorithms
  • Computer Simulation*
  • Genetic Markers / genetics
  • Genomics* / methods
  • Genomics* / statistics & numerical data
  • Humans
  • Melanoma* / genetics
  • Models, Statistical

Substances

  • Genetic Markers