Learning multiple evolutionary pathways from cross-sectional data

J Comput Biol. 2005 Jul-Aug;12(6):584-98. doi: 10.1089/cmb.2005.12.584.

Abstract

We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Anti-HIV Agents / pharmacology*
  • Biological Evolution*
  • Computer Simulation
  • Cross-Sectional Studies
  • Drug Resistance, Viral / genetics*
  • Genetic Variation / genetics
  • HIV Reverse Transcriptase / antagonists & inhibitors
  • HIV-1 / drug effects
  • HIV-1 / genetics*
  • Humans
  • Learning
  • Likelihood Functions
  • Models, Genetic*
  • Mutagenesis / genetics
  • Mutation
  • Pattern Recognition, Automated
  • Signal Processing, Computer-Assisted*

Substances

  • Anti-HIV Agents
  • HIV Reverse Transcriptase