Identification of mixups among DNA sequencing plates

Bioinformatics. 2002 Nov;18(11):1418-26. doi: 10.1093/bioinformatics/18.11.1418.

Abstract

Motivation: During the process of high-throughput genome sequencing there are opportunities for mixups of reagents and data associated with particular projects. The sequencing templates or sequence data generated for an assembly may become contaminated with reagents or sequences from another project, resulting in poorer quality and inaccurate assemblies.

Results: We have developed a system to assess sequence assemblies and monitor for laboratory mixups. We describe several methods for testing the consistency of assemblies and resolving mixed ones. We use statistical tests to evaluate the distribution of sequencing reads from different plates into contigs, and a graph-based approach to resolve situations where data has been inappropriately combined. While these methods have been designed for use in a high-throughput DNA sequencing environment processing thousands of clones, they can be applied in any situation where distinct sequencing projects are performed at redundant coverage.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artifacts
  • Computer Simulation
  • Contig Mapping / methods
  • Documentation / methods*
  • Equipment Failure Analysis / methods*
  • Human Genome Project
  • Humans
  • Models, Biological
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods
  • Quality Control
  • Reproducibility of Results
  • Research Design
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / instrumentation*
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / standards