Improving protein protein interaction prediction based on phylogenetic information using a least-squares support vector machine

Ann N Y Acad Sci. 2007 Dec:1115:154-67. doi: 10.1196/annals.1407.005. Epub 2007 Oct 9.

Abstract

Predicting protein-protein interactions has become a key step of reverse-engineering biological networks to better understand cellular functions. The experimental methods in determining protein-protein interactions are time-consuming and costly, which has motivated vigorous development of computational approaches for predicting protein-protein interactions. A set of recently developed bioinformatics methods utilizes coevolutionary information of the interacting partners (e.g., as exhibited in the form of correlations between distance matrices, where, for each protein, a matrix stores the pairwise distances between the protein and its orthologs in a group of reference genomes). We proposed a novel method to account for the intra-matrix correlations in improving predictive accuracy. The distance matrices for a pair of proteins are transformed and concatenated into a phylogenetic vector. A least-squares support vector machine is trained and tested on pairs of proteins, represented as phylogenetic vectors, whose interactions are known. The intra-matrix correlations are accounted for by introducing a weighted linear kernel, which determines the dot product of two phylogenetic vectors. The performance, measured as receiver operator characteristic (ROC) score in cross-validation experiments, shows significant improvement of our method (ROC score 0.928) over that obtained by Pearson correlations (0.659).

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Biomedical Engineering / methods
  • Computational Biology / methods
  • Computer Simulation
  • Data Interpretation, Statistical
  • Gene Expression / physiology*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / physiology*
  • Least-Squares Analysis
  • Models, Biological
  • Models, Statistical
  • Pattern Recognition, Automated / methods
  • Phylogeny
  • Protein Interaction Mapping / methods*
  • Proteome / metabolism*
  • Signal Transduction / physiology*

Substances

  • Proteome