Reducing system noise in copy number data using principal components of self-self hybridizations

Proc Natl Acad Sci U S A. 2012 Jan 17;109(3):E103-10. doi: 10.1073/pnas.1106233109. Epub 2011 Dec 29.

Abstract

Genomic copy number variation underlies genetic disorders such as autism, schizophrenia, and congenital heart disease. Copy number variations are commonly detected by array based comparative genomic hybridization of sample to reference DNAs, but probe and operational variables combine to create correlated system noise that degrades detection of genetic events. To correct for this we have explored hybridizations in which no genetic signal is expected, namely "self-self" hybridizations (SSH) comparing DNAs from the same genome. We show that SSH trap a variety of correlated system noise present also in sample-reference (test) data. Through singular value decomposition of SSH, we are able to determine the principal components (PCs) of this noise. The PCs themselves offer deep insights into the sources of noise, and facilitate detection of artifacts. We present evidence that linear and piecewise linear correction of test data with the PCs does not introduce detectable spurious signal, yet improves signal-to-noise metrics, reduces false positives, and facilitates copy number determination.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Copy Number Variations / genetics*
  • DNA Probes / metabolism
  • Databases, Genetic*
  • Genome, Human / genetics
  • Humans
  • Hybridization, Genetic*
  • Male
  • Principal Component Analysis
  • Reference Standards

Substances

  • DNA Probes

Associated data

  • GEO/GSE23682