Discovery of novel antigens associated with infectious diseases is fundamental to the development of serodiagnostic tests and protein subunit vaccines against existing and emerging pathogens. Efforts to predict antigenicity have relied on a few computational algorithms predicting signal peptide sequences (SignalP), transmembrane domains, or subcellular localization (pSort). An empirical protein microarray approach was developed to scan the entire proteome of any infectious microorganism and empirically determine immunoglobulin reactivity against all the antigens from a microorganism in infected individuals. The current database from this activity contains quantitative antibody reactivity data against 35,000 proteins derived from 25 infectious microorganisms and more than 30 million data points derived from 15,000 patient sera. Interrogation of these data sets has revealed ten proteomic features that are associated with antigenicity, allowing an in silico protein sequence and functional annotation based approach to triage the least likely antigenic proteins from those that are more likely to be antigenic. The first iteration of this approach applied to Brucella melitensis predicted 37% of the bacterial proteome containing 91% of the antigens empirically identified by probing proteome microarrays. In this study, we describe a naïve Bayes classification approach that can be used to assign a relative score to the likelihood that an antigen will be immunoreactive and serodiagnostic in a bacterial proteome. This algorithm predicted 20% of the B. melitensis proteome including 91% of the serodiagnostic antigens, a nearly twofold improvement in specificity of the predictor. These results give us confidence that further development of this approach will lead to further improvements in the sensitivity and specificity of this in silico predictive algorithm.
Copyright © 2012 Verlag Helvetica Chimica Acta AG, Zürich.