Background: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.
Results: We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.
Conclusions: The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.
Keywords: DNA sequencing; Hierarchical Dirichlet; Single-cell DNA sequencing; simulator.