SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data

BMC Bioinformatics. 2020 May 26;21(1):215. doi: 10.1186/s12859-020-03550-1.

Abstract

Background: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.

Results: We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.

Conclusions: The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.

Keywords: DNA sequencing; Hierarchical Dirichlet; Single-cell DNA sequencing; simulator.

MeSH terms

  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Analysis, DNA / methods*
  • Single-Cell Analysis / methods*
  • Software*