Epidemiological surveillance of animal tuberculosis (TB) based on whole genome sequencing (WGS) of Mycobacterium bovis has recently gained track due to its high resolution to identify infection sources, characterize the pathogen population structure, and facilitate contact tracing. However, the workflow from bacterial isolation to sequence data analysis has several technical challenges that may severely impact the power to understand the epidemiological scenario and inform outbreak response. While trying to use archived DNA from cultured samples obtained during routine official surveillance of animal TB in Portugal, we struggled against three major challenges: the low amount of M. bovis DNA obtained from routinely processed animal samples; the lack of purity of M. bovis DNA, i.e., high levels of contamination with DNA from other organisms; and the co-occurrence of more than one M. bovis strain per sample (within-host mixed infection). The loss of isolated genomes generates missed links in transmission chain reconstruction, hampering the biological and epidemiological interpretation of data as a whole. Upon identification of these challenges, we implemented an integrated solution framework based on whole genome amplification and a dedicated computational pipeline to minimize their effects and recover as many genomes as possible. With the approaches described herein, we were able to recover 62 out of 100 samples that would have otherwise been lost. Based on these results, we discuss adjustments that should be made in official and research laboratories to facilitate the sequential implementation of bacteriological culture, PCR, downstream genomics, and computational-based methods. All of this in a time frame supporting data-driven intervention.
Keywords: Mycobacterium bovis; animal tuberculosis; computational biology; mixed infection; whole genome amplification; whole genome sequencing.