A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms

Matthew The; Fredrik Edfors; Yasset Perez-Riverol; Samuel H Payne; Michael R Hoopmann; Magnus Palmblad; Björn Forsström; Lukas Käll

doi:10.1021/acs.jproteome.7b00899

A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms

J Proteome Res. 2018 May 4;17(5):1879-1886. doi: 10.1021/acs.jproteome.7b00899. Epub 2018 Apr 16.

Authors

Matthew The¹, Fredrik Edfors¹, Yasset Perez-Riverol², Samuel H Payne³, Michael R Hoopmann⁴, Magnus Palmblad⁵, Björn Forsström¹, Lukas Käll¹

Affiliations

¹ Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden.
² European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom.
³ Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99352 , United States.
⁴ Institute for Systems Biology , Seattle , Washington 98109 , United States.
⁵ Center for Proteomics and Metabolomics , Leiden University Medical Center , 2300 RC Leiden , The Netherlands.

Abstract

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

Keywords: benchmark; homology; mass spectrometry; peptide; protein inference; protein standard; proteofom; proteomics; sample of known content.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Benchmarking / methods*
Benchmarking / standards
Escherichia coli / metabolism
Humans
Peptide Fragments / analysis
Peptides / analysis
Proteins / analysis
Proteins / metabolism
Proteomics / methods*
Sequence Homology, Amino Acid*
Trypsin / metabolism

Substances

Peptide Fragments
Peptides
Proteins
Trypsin

Abstract

Publication types

MeSH terms

Substances

Grants and funding