A Next-Generation Sequencing Data Analysis Pipeline for Detecting Unknown Pathogens from Mixed Clinical Samples and Revealing Their Genetic Diversity

PLoS One. 2016 Mar 17;11(3):e0151495. doi: 10.1371/journal.pone.0151495. eCollection 2016.

Abstract

Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5-6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Cell Line
  • Cell Line, Tumor
  • Dogs
  • Enzyme-Linked Immunosorbent Assay / methods
  • Genetic Variation*
  • Genome, Viral / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Madin Darby Canine Kidney Cells
  • Phylogeny
  • Polymerase Chain Reaction / methods
  • Recombination, Genetic
  • Reproducibility of Results
  • Virus Diseases / virology*
  • Viruses / classification
  • Viruses / genetics*
  • Viruses / isolation & purification

Grants and funding

KCT received five grants from Chang Gung Memorial Hospital, Taoyuan, Taiwan (from No. CMRPG3C0731 to CMRPG3C0733, CLRPG3B0042 and CLRPG3B0043; http://www.cgmh.org.tw). GWC received one grant from the Ministry of Science and Technology, Taiwan (No. MOST-103-2221-E-182-022; http://www.most.gov.tw/). A total of 6 grants partially supported this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.