mStrain: strain-level identification of Yersinia pestis using metagenomic data

Bioinform Adv. 2023 Sep 15;3(1):vbad115. doi: 10.1093/bioadv/vbad115. eCollection 2023.

Abstract

Motivation: High-resolution target pathogen detection using metagenomic sequencing data represents a major challenge due to the low concentration of target pathogens in samples. We introduced mStrain, a novel Yesinia pestis strain/lineage-level identification tool that utilizes metagenomic data. mStrain successfully identified Y. pestis at the strain/lineage level by extracting sufficient information regarding single-nucleotide polymorphisms (SNPs), which can therefore be an effective tool for identification and source tracking of Y. pestis based on metagenomic data during a plague outbreak.

Definition: .

Strain-level identification: Assigning the reads in the metagenomic sequencing data to an exactly known or most closely representative Y. pestis strain.

Lineage-level identification: Assigning the reads in the metagenomic sequencing data to a specific lineage on the phylogenetic tree.

canosnps: The unique and typical SNPs present in all representative strains.

Ancestor/derived state: An SNP is defined as the ancestor state when consistent with the allele of Yersinia pseudotuberculosis strain IP32953; otherwise, the SNP is defined as the derived state.

Availability and implementation: The code for running mStrain, the test dataset, and instructions for running the code can be found at the following GitHub repository: https://github.com/xwqian1123/mStrain.