A Comparison of Three Different Bioinformatics Analyses of the 16S-23S rRNA Encoding Region for Bacterial Identification

Front Microbiol. 2019 Apr 16:10:620. doi: 10.3389/fmicb.2019.00620. eCollection 2019.

Abstract

Rapid and reliable identification of bacterial pathogens directly from patient samples is required for optimizing antimicrobial therapy. Although Sanger sequencing of the 16S ribosomal RNA (rRNA) gene is used as a molecular method, species identification and discrimination is not always achievable for bacteria as their 16S rRNA genes have sometimes high sequence homology. Recently, next generation sequencing (NGS) of the 16S-23S rRNA encoding region has been proposed for reliable identification of pathogens directly from patient samples. However, data analysis is laborious and time-consuming and a database for the complete 16S-23S rRNA encoding region is not available. Therefore, a better, faster, and stronger approach is needed for NGS data analysis of the 16S-23S rRNA encoding region. We compared speed and diagnostic accuracy of different data analysis approaches: de novo assembly followed by Basic Local Alignment Search Tool (BLAST), operational taxonomic unit (OTU) clustering, or mapping using an in-house developed 16S-23S rRNA encoding region database for the identification of bacterial species. De novo assembly followed by BLAST using the in-house database was superior to the other methods, resulting in the shortest turnaround time (2 h and 5 min), approximately 2 h less than OTU clustering and 4.5 h less than mapping, and a sensitivity of 80%. Mapping was the slowest and most laborious data analysis approach with a sensitivity of 60%, whereas OTU clustering was the least laborious approach with 70% sensitivity. Although the in-house database requires more sequence entries to improve the sensitivity, the combination of de novo assembly and BLAST currently appears to be the optimal approach for data analysis.

Keywords: OTU clustering; clinical microbiology; de novo assembly; diagnostics; mapping; metagenomics; next-generation sequencing.