A pipeline for copy number variation detection based on principal component analysis

Annu Int Conf IEEE Eng Med Biol Soc. 2011:2011:6975-8. doi: 10.1109/IEMBS.2011.6091763.

Abstract

DNA copy number variation (CNV), an important structural variation, is known to be pervasive in the human genome and the determination of CNVs is essential to understanding their potential effects on the susceptibility to diseases. However, CNV detection using SNP array data is challenging due to the low signal-to-noise ratio. In this study, we propose a principal component analysis (PCA) based approach for data correction, and present a novel processing pipeline for reliable CNV detection. Tested data include both simulated and real SNP array datasets. Simulations demonstrate a substantial reduction in the false positive rate of CNV detection after PCA-correction. And we also observe a significant improvement in data quality in real SNP array data after correction.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Analysis of Variance
  • Computer Simulation
  • DNA / analysis
  • DNA Copy Number Variations*
  • False Positive Reactions
  • Female
  • Genetic Predisposition to Disease
  • Genome, Human
  • Genotype
  • Humans
  • Male
  • Normal Distribution
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis
  • Reproducibility of Results
  • Signal Processing, Computer-Assisted*

Substances

  • DNA