g2pDB: A Database Mapping Protein Post-Translational Modifications to Genomic Coordinates

Sarah Keegan; John P Cortens; Ronald C Beavis; David Fenyö

doi:10.1021/acs.jproteome.5b01018

g2pDB: A Database Mapping Protein Post-Translational Modifications to Genomic Coordinates

J Proteome Res. 2016 Mar 4;15(3):983-90. doi: 10.1021/acs.jproteome.5b01018. Epub 2016 Feb 18.

Authors

Sarah Keegan¹, John P Cortens², Ronald C Beavis², David Fenyö¹

Affiliations

¹ Center for Health Informatics and Bioinformatics, New York University Medical School , 227 East 30 Street, New York, New York 10016, United States.
² Department of Biochemistry and Medical Genetics, University of Manitoba, Faculty of Health Sciences , 744 Bannatyne Avenue, Winnipeg, MB R3E 0W3, Canada.

PMID: 26842767
DOI: 10.1021/acs.jproteome.5b01018

Abstract

Large scale proteomics have made it possible to broadly screen samples for the presence of many types of post-translational modifications, such as phosphorylation, acetylation, and ubiquitination. This type of data has allowed the localization of these modifications to either a specific site on a proteolytically generated peptide or to within a small domain on the peptide. The resulting modification acceptor sites can then be mapped onto the appropriate protein sequences and the information archived. This paper describes the usage of a very large archive of experimental observations of human post-translational modifications to create a map of the most reproducible modification observations onto the complete set of human protein sequences. This set of modification acceptor sites was then directly translated into the genomic coordinates for the codons for the residues at those sites. We constructed the database g2pDB using this protein-to-codon site mapping information. The information in g2pDB has been made available through a RESTful-style API, allowing researchers to determine which specific protein modifications would be perturbed by a set of observed nucleotide variants determined by high throughput DNA or RNA sequencing.

Keywords: REST API; acetylation; genome coordinate; phosphorylation; post-translational modification; protein coordinate; single nucleotide variant; ubiquitination.

MeSH terms

Acetylation
Amino Acid Sequence
Databases, Protein*
Humans
Molecular Sequence Annotation
Peptide Mapping
Phosphorylation
Protein Processing, Post-Translational*
Proteomics
Software