Bottom-up proteomics is increasingly being used to characterize unknown environmental, clinical, and forensic samples. Proteomics-based bacterial identification typically proceeds by tabulating peptide "hits" (i.e., confidently identified peptides) associated with the organisms in a database; those organisms with enough hits are declared present in the sample. This approach has proven to be successful in laboratory studies; however, important research gaps remain. First, the common-practice reliance on unique peptides for identification is susceptible to a phenomenon known as signal erosion. Second, no general guidelines are available for determining how many hits are needed to make a confident identification. These gaps inhibit the transition of this approach to real-world forensic samples where conditions vary and large databases may be needed. In this work, we propose statistical criteria that overcome the problem of signal erosion and can be applied regardless of the sample quality or data analysis pipeline. These criteria are straightforward, producing a p-value on the result of an organism or toxin identification. We test the proposed criteria on 919 LC-MS/MS data sets originating from 2 toxins and 32 bacterial strains acquired using multiple data collection platforms. Results reveal a > 95% correct species-level identification rate, demonstrating the effectiveness and robustness of proteomics-based organism/toxin identification.
Keywords: bottom-up proteomics; environmental proteomics; forensic proteomics; metaproteomics; microorganism identification; toxin identification.