Combined dynamic arrays for storing and searching semi-ordered tandem mass spectrometry data

Jian Feng; Daniel Q Naiman; Bret Cooper

doi:10.1089/cmb.2008.0011

Combined dynamic arrays for storing and searching semi-ordered tandem mass spectrometry data

J Comput Biol. 2008 May;15(4):457-68. doi: 10.1089/cmb.2008.0011.

Authors

Jian Feng¹, Daniel Q Naiman, Bret Cooper

Affiliation

¹ Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland, USA.

PMID: 18435569
DOI: 10.1089/cmb.2008.0011

Abstract

When performing bioinformatics analysis on tandem mass spectrometry data, there is a computational need to efficiently store and sort these semi-ordered datasets. To solve this problem, a new data structure based on dynamic arrays was designed and implemented in an algorithm that parses semi-ordered data made by Mascot, a separate software program that matches peptide tandem mass spectra to protein sequences in a database. By accommodating the special features of these large datasets, the combined dynamic array (CDA) provides efficient searching and insertion operations. The operations on real datasets using this new data structure are hundreds times faster than operations using binary tree and red-black tree structures. The difference becomes more significant when the dataset size grows. This data structure may be useful for improving the speed of other related types of protein assembling software or other types of software that operate on datasets with similar semi-ordered features.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Amino Acid Sequence
Computational Biology
Molecular Sequence Data
Sequence Analysis, Protein
Software*
Statistics as Topic / methods*
Tandem Mass Spectrometry*