Background: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans.
Methods and findings: An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei.
Conclusions: This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.