A statistical model for HIV-1 sequence classification using the subtype analyser (STAR)

Bioinformatics. 2005 Sep 1;21(17):3535-40. doi: 10.1093/bioinformatics/bti569. Epub 2005 Jul 26.

Abstract

Motivation: HIV-1 antiretroviral drug resistance testing produces large amounts of HIV-1 protease and reverse transcriptase sequences. These provide an excellent resource to study the incidence, spread and clinical significance of HIV-1 subtypes. We have produced a program, Subtype Analyser (STAR) that rapidly and accurately subtypes HIV-1. Here we have determined a robust and statistically validated model for subtype assignment.

Results: We have significantly extended our HIV-1 subtyping tool (STAR), such that each query sequence when evaluated against subtype profile alignments, returns a discriminating score based on the ratio of subtype positive to negative amino acid positions. These scores were transformed into a Z-score distribution and evaluated. Of the 141 sequences used to define the subtype alignments, 98% were correctly reclassified. Inclusion of additional recombination detection within STAR increased the detection of known recombinant sequences to 95%.

Availability: STAR is available as compiled (Linux Fedora 3) or source code from http://pgv19.virol.ucl.ac.uk/download/star_linux.tar

Contact: [email protected]

Supplementary information: http://pgv19.virol.ucl.ac.uk/download/star_supplement

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Chromosome Mapping / methods*
  • HIV Protease / analysis
  • HIV Protease / chemistry*
  • HIV Protease / genetics*
  • Models, Genetic*
  • Models, Statistical
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • HIV Protease