Using relational databases for improved sequence similarity searching and large-scale genomic analyses

Curr Protoc Bioinformatics. 2004 Oct:Chapter 9:Unit 9.4. doi: 10.1002/0471250953.bi0904s7.

Abstract

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Chromosome Mapping / methods*
  • Database Management Systems*
  • Databases, Protein*
  • Information Storage and Retrieval / methods*
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid*