RAPSearch: a fast protein similarity search tool for short reads

Yuzhen Ye; Jeong-Hyeon Choi; Haixu Tang

doi:10.1186/1471-2105-12-159

RAPSearch: a fast protein similarity search tool for short reads

BMC Bioinformatics. 2011 May 15:12:159. doi: 10.1186/1471-2105-12-159.

Authors

Yuzhen Ye¹, Jeong-Hyeon Choi, Haixu Tang

Affiliation

¹ School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA. [email protected]

Abstract

Background: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets.

Results: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST.

Conclusions: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence*
Metagenomics
Molecular Sequence Data
Proteins / chemistry*
Search Engine
Sequence Alignment / methods
Sequence Analysis, Protein / methods*
Software*

Substances

Proteins

Grants and funding

1R01HG004908/HG/NHGRI NIH HHS/United States