MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification

Ananth Kalyanaraman; William R Cannon; Benjamin Latt; Douglas J Baxter

doi:10.1093/bioinformatics/btr523

MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification

Bioinformatics. 2011 Nov 1;27(21):3072-3. doi: 10.1093/bioinformatics/btr523. Epub 2011 Sep 16.

Authors

Ananth Kalyanaraman¹, William R Cannon, Benjamin Latt, Douglas J Baxter

Affiliation

¹ School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164-2752, USA. [email protected]

Abstract

Summary: A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

Availability: The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph.

Contact: [email protected]; [email protected].

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Databases, Protein
Mass Spectrometry / methods*
Peptides / chemistry*
Sequence Analysis, Protein
Software*

Substances

Peptides