On-line tools for sequence retrieval and multivariate statistics in molecular biology

Comput Appl Biosci. 1996 Feb;12(1):63-9. doi: 10.1093/bioinformatics/12.1.63.

Abstract

We have developed a World-Wide Web server for browsing sequence collections structured under the ACNUC format and for performing multivariate analyses on sequences. General collections (like GenBank or EMBL), as well as specialized data banks (like Hovergen and NRSub) can be accessed. This system allows complex queries to be constructed, and the result of each query, represented by a list of sequences, is stored on the server. It is then possible to reuse this list to compute multivariate analyses on the sequences. Two examples of applications are shown. The first one consists in a study of codon usage with correspondence analysis on all the protein genes of Haemophilus influenzae Rd. This study allows the highly expressed genes and the integral membrane proteins of this organism to be identified. The second one consists in an ordering of 70 aligned protein sequences of growth hormone with principal coordinate analysis. With this method, we are able to re-establish the patterns of relationships between the sequences previously determined with tree building programs.

MeSH terms

  • Algorithms
  • Animals
  • Biometry
  • Codon / genetics
  • Computer Communication Networks*
  • Databases, Factual*
  • Evaluation Studies as Topic
  • Growth Hormone / genetics
  • Haemophilus influenzae / genetics
  • Molecular Biology / statistics & numerical data*
  • Multivariate Analysis
  • Online Systems*
  • Sequence Analysis / methods*
  • Sequence Analysis / statistics & numerical data

Substances

  • Codon
  • Growth Hormone