Mature T cells express either CD8 or CD4, defining two physiologically distinct populations of T cells. CD8+ T cells, or killer T-cells, and CD4+ T cells, or helper T cells, effect different aspects of T cell mediated adaptive immunity. Currently, determining the ratio of CD4+ to CD8+ T cells requires flow cytometry or immunohistochemistry. The genomic T cell receptor locus is rearranged during T cell maturation, generating a highly variable T cell receptor locus in each mature T cell. As part of thymic maturation, T cells that will become CD4+ versus CD8+ are subjected to different selective pressures. In this study, we apply high-throughput next-generation sequencing to T cells from both a healthy cohort and a cohort with an autoimmune disease (multiple sclerosis) to identify sequence features in the variable CDR3 region of the rearranged T cell receptor gene that distinguish CD4+ from CD8+ T cells. We identify sequence features that differ between CD4+ and CD8+ T cells, including Variable gene usage and CDR3 region length. We implement a likelihood model to estimate relative proportions of CD4+ and CD8+ T cells using these features. Our model accurately estimates the proportion of CD4+ and CD8+ T cell sequences in samples from healthy and diseased immune systems, and simulations indicate that it can be applied to as few as 1000 T cell receptor sequences; we validate this model using in vitro mixtures of T cell sequences, and by comparing the results of our method to flow cytometry using peripheral blood samples. We believe our computational method for determining the CD4:CD8 ratio in T cell samples from sequence data will provide additional useful information for any samples on which high-throughput TCR sequencing is performed, potentially including some solid tumors.
Copyright © 2013. Published by Elsevier B.V.