Improving the efficiency of biomarker identification using biological knowledge

Pac Symp Biocomput. 2009:427-38.

Abstract

Identifying and validating biomarkers from high-throughput gene expression data is important for understanding and treating cancer. Typically, we identify candidate biomarkers as features that are differentially expressed between two or more classes of samples. Many feature selection metrics rely on ranking by some measure of differential expression. However, interpreting these results is difficult due to the large variety of existing algorithms and metrics, each of which may produce different results. Consequently, a feature ranking metric may work well on some datasets but perform considerably worse on others. We propose a method to choose an optimal feature ranking metric on an individual dataset basis. A metric is optimal if, for a particular dataset, it favorably ranks features that are known to be relevant biomarkers. Extensive knowledge of biomarker candidates is available in public databases and literature. Using this knowledge, we can choose a ranking metric that produces the most biologically meaningful results. In this paper, we first describe a framework for assessing the ability of a ranking metric to detect known relevant biomarkers. We then apply this method to clinical renal cancer microarray data to choose an optimal metric and identify several candidate biomarkers.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Biomarkers*
  • Biomarkers, Tumor / genetics
  • Biometry
  • Databases, Genetic
  • Genetic Markers
  • Humans
  • Kidney Neoplasms / classification
  • Kidney Neoplasms / genetics
  • Knowledge Bases*
  • Neoplasms / classification
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Reverse Transcriptase Polymerase Chain Reaction / statistics & numerical data

Substances

  • Biomarkers
  • Biomarkers, Tumor
  • Genetic Markers