Tuberculosis, a disease caused by Mycobacterium tuberculosis (Mtb), is a significant health problem worldwide. Here, we developed a method to detect large insertions and deletions (indels), which have been generally understudied. Leveraging this framework, we performed a comprehensive analysis of single nucleotide variants and small and large indels across 1,960 Mtb clinical isolates. Comparing the distribution of variants demonstrated that gene disruptive variants are underrepresented in genes essential for bacterial survival. An evolutionary analysis revealed that Mtb genomes are enriched in partially deleterious mutations. Genome-wide association studies identified small and large deletions in eccB2 significantly associated with patient prognosis. Additionally, we unveil significant associations with antibiotic resistance in 23 non-canonical genes. Among these, large indels are primarily found in genetic regions of Rv1216c, Rv1217c, fadD11, and ctpD. This study provides a comprehensive catalog of genetic variations and highlights their potential impact for the future treatment and risk prediction of tuberculosis.
Keywords: Mycobacterium tuberculosis; SNVs; indels; large insertions and deletions; non-canonical drug resistance confering genes; single-nucleotide variations; survival associated genes; virulence factors; virulence genes.
Copyright © 2024 Elsevier Inc. All rights reserved.