Background: MultiAlign is a free software tool that aligns multiple liquid chromatography-mass spectrometry datasets to one another by clustering mass and chromatographic elution features across datasets. Applicable to both label-free proteomics and metabolomics comparative analyses, the software can be operated in several modes. For example, clustered features can be matched to a reference database to identify analytes, used to generate abundance profiles, linked to tandem mass spectra based on parent precursor masses, and culled for targeted liquid chromatography-tandem mass spectrometric analysis. MultiAlign is also capable of tandem mass spectral clustering to describe proteome structure and find similarity in subsequent sample runs.
Results: MultiAlign was applied to two large proteomics datasets obtained from liquid chromatography-mass spectrometry analyses of environmental samples. Peptides in the datasets for a microbial community that had a known metagenome were identified by matching mass and elution time features to those in an established reference peptide database. Results compared favorably with those obtained using existing tools such as VIPER, but with the added benefit of being able to trace clusters of peptides across conditions to existing tandem mass spectra. MultiAlign was further applied to detect clusters across experimental samples derived from a reactor biomass community for which no metagenome was available. Several clusters were culled for further analysis to explore changes in the community structure. Lastly, MultiAlign was applied to liquid chromatography-mass spectrometry-based datasets obtained from a previously published study of wild type and mitochondrial fatty acid oxidation enzyme knockdown mutants of human hepatocarcinoma to demonstrate its utility for analyzing metabolomics datasets.
Conclusion: MultiAlign is an efficient software package for finding similar analytes across multiple liquid chromatography-mass spectrometry feature maps, as demonstrated here for both proteomics and metabolomics experiments. The software is particularly useful for proteomic studies where little or no genomic context is known, such as with environmental proteomics.