Outbreak strains of Mycobacterium tuberculosis are promising candidates as targets in the search for intrinsic determinants of transmissibility, as they are responsible for many cases with sustained transmission; however, the use of low-resolution typing methods and restricted geographical investigations represent flaws in assessing the success of long-lived outbreak strains. We can now address the nature of outbreak strains by combining large genomic data sets and phylodynamic approaches. We retrospectively sequenced the whole genome of representative samples assigned to an outbreak circulating in the Canary Islands (the GC strain) since 1993, which accounts for ~20% of local tuberculosis cases. We selected a panel of specific single nucleotide polymorphism (SNP) markers for an in-silico search for additional outbreak-related sequences within publicly available tuberculosis genomic data. Using this information, we inferred the origin, spread, and epidemiological parameters of the GC strain. Our approach allowed us to accurately trace the historical and more recent dispersion of the GC strain. We provide evidence of a highly successful nature within the Canarian archipelago but limited expansion abroad. Estimation of epidemiological parameters from genomic data disagree with a distinctive biology of the GC strain. With the increasing availability of genomic data allowing for the accurate inference of strain spread and critical epidemiological parameters, we can now revisit the link between Mycobacterium tuberculosis genotypes and transmission, as is routinely carried out for SARS-CoV-2 variants of concern. We demonstrate that social determinants rather than intrinsically higher bacterial transmissibility better explain the success of the GC strain. Importantly, our approach can be used to trace and characterize strains of interest worldwide. IMPORTANCE Infectious disease outbreaks represent a significant problem for public health. Tracing outbreak expansion and understanding the main factors behind emergence and persistence remain critical to effective disease control. Our study allows researchers and public health authorities to use Whole-Genome Sequencing-based methods to trace outbreaks, and shows how available epidemiological information helps to evaluate the factors underpinning outbreak persistence. Taking advantage of all the freely available information placed in public repositories, researchers can accurately establish the expansion of an outbreak beyond original boundaries, and determine the potential risk of a strain to inform health authorities which, in turn, can define target strategies to mitigate expansion and persistence. Finally, we show the need to evaluate strain transmissibility in different geographic contexts to unequivocally associate spread to local or pathogenic factors, an important lesson taken from genomic surveillance of SARS-CoV-2.
Keywords: genomic epidemiology; outbreak; phylodynamics; tuberculosis; whole-genome sequencing.