Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data

Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.

Abstract

The pernicious nature of low-quality sequencing data warrants improvement in the bioinformatics workflow for profiling microbial diversity. The conventional merging approach, which drops a copious amount of sequencing reads when processing low-quality amplicon data, requires alternative methods. In this study, a computational workflow, a combination of merging and direct-joining where the paired-end reads lacking overlaps are concatenated and pooled with the merged sequences, is proposed to handle the low-quality amplicon data. The proposed computational strategy was compared with two workflows; the merging approach where the paired-end reads are merged, and the direct-joining approach where the reads are concatenated. The results showed that the merging approach generates a significantly low number of amplicon sequences, limits the microbiome inference, and obscures some microbial associations. In comparison to other workflows, the combination of merging and direct-joining strategy reduces the loss of amplicon data, improves the taxonomy classification, and importantly, abates the misleading results associated with the merging approach when analysing the low-quality amplicon data. The mock community analysis also supports the findings. In summary, the researchers are suggested to follow the merging and direct-joining workflow to avoid problems associated with low-quality data while profiling the microbial community structure.

Keywords: Amplicon; Direct-joining; Low-quality data; Merging; Microbial profiling; Microbiome.

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology* / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Microbiota* / genetics
  • Sequence Analysis, DNA / methods
  • Workflow