Citrullination is an important post-translational modification implicated in many diseases including rheumatoid arthritis (RA), Alzheimer's disease, and cancer. Neutrophil and mast cells have different expression profiles for protein-arginine deiminases (PADs), and ionomycin-induced activation makes them an ideal cellular model to study proteins susceptible to citrullination. We performed high-resolution mass spectrometry and stringent data filtration to identify citrullination sites in neutrophil and mast cells treated with and without ionomycin. We identified a total of 833 validated citrullination sites on 395 proteins. Several of these citrullinated proteins are important components of pathways involved in innate immune responses. Using this benchmark primary sequence data set, we developed machine learning models to predict citrullination in neutrophil and mast cell proteins. We show that our models predict citrullination likelihood with 0.735 and 0.766 AUCs (area under the receiver operating characteristic curves), respectively, on independent validation sets. In summary, this study provides the largest number of validated citrullination sites in neutrophil and mast cell proteins. The use of our novel motif analysis approach to predict citrullination sites will facilitate the discovery of novel protein substrates of protein-arginine deiminases (PADs), which may be key to understanding immunopathologies of various diseases.
Keywords: citrullination; deimination; ionomycin; machine learning; mass spectrometry; mast cell; neutral loss; neutrophil; protein-arginine deiminases.