The computational simulation of complete proteomic data sets and their utility to validate detection and interpretation algorithms, to aid in the design of experiments and to assess protein and peptide false discovery rates is presented. The simulation software has been developed for emulating data originating from data-dependent and data-independent LC-MS workflows. Data from all types of commonly used hybrid mass spectrometers can be simulated. The algorithms are based on empirically derived physicochemical liquid and gas phase models for proteins and peptides. Sample composition in terms of complexity and dynamic range, as well as chromatographic, experimental and MS conditions, can be controlled and adjusted independently. The effect of on-column amounts, gradient length, mass resolution and ion mobility on search specificity will be demonstrated using tryptic peptides from human and yeast cellular lysates simulated over five orders of magnitude in dynamic range. Initial justification of the simulated data sets is achieved by comparing and contrasting the in silico simulated data to experimentally derived results from a 48 protein mixture, spanning a similar magnitude of five orders of magnitude. Additionally, experimental data from replicate and dilutions series experiments will be utilized to determine error rates at the peptide and protein level with respect to mass, area, retention and drift time. The data presented reveal a high degree of similarity at the ion detection, peptide and protein level when analyzed under similar conditions.
Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.