Almost all regulatory processes in biology ultimately lead to or originate from modifications of protein function. However, it is unclear to which extent each mechanism of regulation actually affects proteins and thus phenotypes. We assessed the extent of N-terminal protein truncation in a global analysis of N-terminomics data and find that most proteins have N-terminally truncated proteoforms. Because N-terminomics analyses do not identify the process generating the identified N-termini, we compared identified termini to the three N-termini generating events: protein cleavage, alternative translation, and alternative splicing. Of these, we sought to identify the most likely cause of N-terminal protein truncations in the human proteome. We found that protease cleavage and alternative protein translation are the likely cause for most shortened proteoforms. However, the vast majority (about 90%) of N-termini remain unexplained by any of these processes identified to date, so revealing large gaps in our knowledge of protein termini and their genesis. Further analysis and annotation of terminomics data is required, to which end we have created the TopFIND database, a major systematic annotation effort for protein termini. We outline the new features in version 3.0 of the updated database and the new bioinformatics tools available and encourage submission of generated data to fill current knowledge gaps.
Keywords: Alternative translation; Protease cleavage; Protease web; Systems biology; TAILS; Terminomics; TopFIND.
© 2015 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.