Motivation: An important application of protein microarray data analysis is identifying a serodiagnostic antigen set that can reliably detect patterns and classify antigen expression profiles. This work addresses this problem using antibody responses to protein markers measured by a novel high-throughput microarray technology. The findings from this study have direct relevance to rapid, broad-based diagnostic and vaccine development.
Results: Protein microarray chips are probed with sera from individuals infected with the bacteria Francisella tularensis, a category A biodefense pathogen. A two-step approach to the diagnostic process is presented (1) feature (antigen) selection and (2) classification using antigen response measurements obtained from F.tularensis microarrays (244 antigens, 46 infected and 54 healthy human sera measurements). To select antigens, a ranking scheme based on the identification of significant immune responses and differential expression analysis is described. Classification methods including k-nearest neighbors, support vector machines (SVM) and k-Means clustering are applied to training data using selected antigen sets of various sizes. SVM based models yield prediction accuracy rates in the range of approximately 90% on validation data, when antigen set sizes are between 25 and 50. These results strongly indicate that the top-ranked antigens can be considered high-priority candidates for diagnostic development.
Availability: All software programs are written in R and available at http://www.igb.uci.edu/index.php?page=tools and at http://www.r-project.org.
Supplementary information: Supplementary data are available at Bioinformatics online.