Purpose: To evaluate the performance of core genome multilocus sequence typing (cgMLST) for genotyping Mycobacterium tuberculosis (M.tuberculosis) Strains in regions where the lineage 2 strains predominate.
Methods: We compared clustering by whole-genome SNP typing with cgMLST clustering in the analysis of WGS data of 6240 strains from five regions of China. Using both the receiver operating characteristic (ROC) curve and epidemiological investigation to determine the optimal threshold for defining genomic clustering by cgMLST. The performance of cgMLST was evaluated by quantifying the sensitivity, specificity and concordance of clustering between two methods. Logistic regression was used to gauge the impact of strain genetic diversity and lineage on cgMLST clustering.
Results: The optimal threshold for cgMLST to define genomic clustering was determined to be ≤ 10 allelic differences between strains. The overall sensitivity and specificity of cgMLST averaged 99.6% and 96.3%, respectively; the concordance of clustering between two methods averaged 97.1%. Concordance was significantly correlated with strain genetic diversity and was 3.99 times (95% CI, 2.94-5.42) higher in regions with high genetic diversity (π > 1.55 × 10-4) compared to regions with low genetic diversity. The difference missed statistical significance, while concordance for lineage 2 strains (96.8%) was less than that for lineage 4 strains (98.3%). CONCLUSION : cgMLST showed a discriminatory power comparable to whole-genome SNP typing and could be used to genotype clinical M.tuberculosis strains in different regions of China. The discriminative power of cgMLST was significantly correlated with strain genetic diversity and was slightly lower with strains from regions with low genetic diversity.
Keywords: Genotyping; Mycobacterium tuberculosis; Whole-genome sequencing; cgMLST.
© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.