LABEL: fast and accurate lineage assignment with assessment of H5N1 and H9N2 influenza A hemagglutinins

PLoS One. 2014 Jan 23;9(1):e86921. doi: 10.1371/journal.pone.0086921. eCollection 2014.

Abstract

The evolutionary classification of influenza genes into lineages is a first step in understanding their molecular epidemiology and can inform the subsequent implementation of control measures. We introduce a novel approach called Lineage Assignment By Extended Learning (LABEL) to rapidly determine cladistic information for any number of genes without the need for time-consuming sequence alignment, phylogenetic tree construction, or manual annotation. Instead, LABEL relies on hidden Markov model profiles and support vector machine training to hierarchically classify gene sequences by their similarity to pre-defined lineages. We assessed LABEL by analyzing the annotated hemagglutinin genes of highly pathogenic (H5N1) and low pathogenicity (H9N2) avian influenza A viruses. Using the WHO/FAO/OIE H5N1 evolution working group nomenclature, the LABEL pipeline quickly and accurately identified the H5 lineages of uncharacterized sequences. Moreover, we developed an updated clade nomenclature for the H9 hemagglutinin gene and show a similarly fast and reliable phylogenetic assessment with LABEL. While this study was focused on hemagglutinin sequences, LABEL could be applied to the analysis of any gene and shows great potential to guide molecular epidemiology activities, accelerate database annotation, and provide a data sorting tool for other large-scale bioinformatic studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bayes Theorem
  • Cell Lineage*
  • Chickens
  • Evolution, Molecular*
  • Hemagglutinin Glycoproteins, Influenza Virus / analysis*
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics
  • Influenza A Virus, H5N1 Subtype / classification
  • Influenza A Virus, H5N1 Subtype / genetics
  • Influenza A Virus, H5N1 Subtype / pathogenicity
  • Influenza A Virus, H9N2 Subtype / classification
  • Influenza A Virus, H9N2 Subtype / genetics
  • Influenza A Virus, H9N2 Subtype / pathogenicity*
  • Influenza in Birds / genetics
  • Influenza in Birds / virology*
  • Phylogeny
  • Poultry Diseases / virology*
  • Sequence Analysis, DNA
  • Software

Substances

  • Hemagglutinin Glycoproteins, Influenza Virus

Grants and funding

This research was supported in part by an appointment to the Research Participation Program at the Centers for Disease Control and Prevention administered by the Oak Ridge Institute for Science and Education (to S.S.S.) through an interagency agreement between the U.S. Department of Energy and CDC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.