CAPER 3.0: A Scalable Cloud-Based System for Data-Intensive Analysis of Chromosome-Centric Human Proteome Project Data Sets

Shuai Yang; Xinlei Zhang; Lihong Diao; Feifei Guo; Dan Wang; Zhongyang Liu; Honglei Li; Junjie Zheng; Jingshan Pan; Edouard C Nice; Dong Li; Fuchu He

doi:10.1021/pr501335w

CAPER 3.0: A Scalable Cloud-Based System for Data-Intensive Analysis of Chromosome-Centric Human Proteome Project Data Sets

J Proteome Res. 2015 Sep 4;14(9):3720-8. doi: 10.1021/pr501335w. Epub 2015 Mar 27.

Authors

Shuai Yang^{1

2}, Xinlei Zhang³, Lihong Diao^{1

2}, Feifei Guo^{1

2

4}, Dan Wang^{1

2}, Zhongyang Liu^{1

2}, Honglei Li³, Junjie Zheng^{1

2}, Jingshan Pan⁵, Edouard C Nice⁶, Dong Li^{1

2}, Fuchu He^{1

2}

Affiliations

¹ State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , Beijing 100850, China.
² National Center for Protein Sciences Beijing , Beijing 102206, China.
³ Beijing Genestone Technology Ltd. , Beijing 100085, China.
⁴ Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College , Beijing 100005, China.
⁵ Shandong Computer Science Center (National Supercomputer Center in Jinan) , Shandong 250101, China.
⁶ Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia.

PMID: 25794139
DOI: 10.1021/pr501335w

Abstract

The Chromosome-centric Human Proteome Project (C-HPP) aims to catalog genome-encoded proteins using a chromosome-by-chromosome strategy. As the C-HPP proceeds, the increasing requirement for data-intensive analysis of the MS/MS data poses a challenge to the proteomic community, especially small laboratories lacking computational infrastructure. To address this challenge, we have updated the previous CAPER browser into a higher version, CAPER 3.0, which is a scalable cloud-based system for data-intensive analysis of C-HPP data sets. CAPER 3.0 uses cloud computing technology to facilitate MS/MS-based peptide identification. In particular, it can use both public and private cloud, facilitating the analysis of C-HPP data sets. CAPER 3.0 provides a graphical user interface (GUI) to help users transfer data, configure jobs, track progress, and visualize the results comprehensively. These features enable users without programming expertise to easily conduct data-intensive analysis using CAPER 3.0. Here, we illustrate the usage of CAPER 3.0 with four specific mass spectral data-intensive problems: detecting novel peptides, identifying single amino acid variants (SAVs) derived from known missense mutations, identifying sample-specific SAVs, and identifying exon-skipping events. CAPER 3.0 is available at http://prodigy.bprc.ac.cn/caper3.

Keywords: Chromosome-centric Human Proteome Project; Proteomic data analysis platform; big data; bioinformatics; cloud computing; proteomic data visualization.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Chromosome Mapping*
Cloud Computing*
Databases, Protein*
Humans
Molecular Sequence Data
Polymorphism, Genetic
Proteins / chemistry
Proteins / genetics*
Proteome*

Substances

Proteins
Proteome