Using relational databases for improved sequence similarity searching and large-scale genomic analyses

Aaron J Mackey; William R Pearson

doi:10.1002/0471250953.bi0904s7

Using relational databases for improved sequence similarity searching and large-scale genomic analyses

Curr Protoc Bioinformatics. 2004 Oct:Chapter 9:Unit 9.4. doi: 10.1002/0471250953.bi0904s7.

Authors

Aaron J Mackey¹, William R Pearson

Affiliation

¹ University of Virginia, Charlottesville, Virginia, USA.

PMID: 18428739
DOI: 10.1002/0471250953.bi0904s7

Abstract

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

MeSH terms

Algorithms
Amino Acid Sequence
Chromosome Mapping / methods*
Database Management Systems*
Databases, Protein*
Information Storage and Retrieval / methods*
Molecular Sequence Data
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Sequence Homology, Amino Acid*