Learning gene functional classifications from multiple data types

J Comput Biol. 2002;9(2):401-11. doi: 10.1089/10665270252935539.

Abstract

In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. In addition, we describe feature scaling methods for further exploiting prior knowledge of heterogeneity by giving each data type different weights.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Computational Biology
  • Databases, Genetic
  • Gene Expression Profiling / statistics & numerical data*
  • Genes, Fungal
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Phylogeny*
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / genetics

Substances

  • Saccharomyces cerevisiae Proteins