Background: Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In this case, accuracy is inherently limited since cell subset-specific differential expression of gene signatures will be diluted by RNA from other cells. While using specific PBMC subsets for transcriptional profiling would improve our ability to extract knowledge from these data, it is rarely obvious which cell subset(s) will be the most informative.
Results: We have developed a computational method (Subset Prediction from Enrichment Correlation, SPEC) to predict the cellular source for a pre-defined list of genes (i.e. a gene signature) using only data from total PBMCs. SPEC does not rely on the occurrence of cell subset-specific genes in the signature, but rather takes advantage of correlations with subset-specific genes across a set of samples. Validation using multiple experimental datasets demonstrates that SPEC can accurately identify the source of a gene signature as myeloid or lymphoid, as well as differentiate between B cells, T cells, NK cells and monocytes. Using SPEC, we predict that myeloid cells are the source of the interferon-therapy response gene signature associated with HCV patients who are non-responsive to standard therapy.
Conclusions: SPEC is a powerful technique for blood genomic studies. It can help identify specific cell subsets that are important for understanding disease and therapy response. SPEC is widely applicable since only gene expression profiles from total PBMCs are required, and thus it can easily be used to mine the massive amount of existing microarray or RNA-seq data.