DIA proteomics data from a UPS1-spiked E.coli protein mixture processed with six software tools

Clarisse Gotti; Florence Roux-Dalvai; Charles Joly-Beauparlant; Loïc Mangnier; Mickaël Leclercq; Arnaud Droit

doi:10.1016/j.dib.2022.107829

DIA proteomics data from a UPS1-spiked E.coli protein mixture processed with six software tools

Data Brief. 2022 Jan 31:41:107829. doi: 10.1016/j.dib.2022.107829. eCollection 2022 Apr.

Authors

Clarisse Gotti^{1

2}, Florence Roux-Dalvai^{1

2}, Charles Joly-Beauparlant², Loïc Mangnier², Mickaël Leclercq², Arnaud Droit^{1

2}

Affiliations

¹ Proteomics Platform, CHU de Québec - Université Laval Research Centre, Québec City, Québec G1V 4G2, Canada.
² Computational Biology Laboratory, CHU de Québec - Université Laval Research Centre, Québec City, Québec G1V 4G2, Canada.

Abstract

In this article, we provide a proteomic reference dataset that has been initially generated for a benchmarking of software tools for Data-Independent Acquisition (DIA) analysis. This large dataset includes 96 DIA .raw files acquired from a complex proteomic standard composed of an E.coli protein background spiked-in with 8 different concentrations of 48 human proteins (UPS1 Sigma). These 8 samples were analyzed in triplicates on an Orbitrap mass spectrometer with 4 different DIA window schemes. We also provide the spectral libraries and FASTA file used for their analysis and the software outputs of the six tools used in this study: DIA-NN, Spectronaut, ScaffoldDIA, DIA-Umpire, Skyline and OpenSWATH. This dataset also contains post-processed quantification tables where the peptides and proteins have been validated, their intensities normalized and the missing values imputed with a noise value. All the files are available on ProteomeXchange. Altogether, these files represent the most comprehensive DIA reference dataset acquired on an Orbitrap instrument ever published. It will be a very useful resource to the proteomic scientists in order to assess the performance of DIA software tools or to test their processing pipelines, to the software developers to improve their tools or develop new ones and to the students for their training on proteomics data analysis.

Keywords: Complex proteomic standard; Data Independent Acquisition; Software tools benchmark; Spiked UPS1 human proteins.