Computational prediction of methylation status in human genomic sequences

Proc Natl Acad Sci U S A. 2006 Jul 11;103(28):10713-6. doi: 10.1073/pnas.0602949103. Epub 2006 Jul 3.

Abstract

Epigenetic effects in mammals depend largely on heritable genomic methylation patterns. We describe a computational pattern recognition method that is used to predict the methylation landscape of human brain DNA. This method can be applied both to CpG islands and to non-CpG island regions. It computes the methylation propensity for an 800-bp region centered on a CpG dinucleotide based on specific sequence features within the region. We tested several classifiers for classification performance, including K means clustering, linear discriminant analysis, logistic regression, and support vector machine. The best performing classifier used the support vector machine approach. Our program (called hdfinder) presently has a prediction accuracy of 86%, as validated with CpG regions for which methylation status has been experimentally determined. Using hdfinder, we have depicted the entire genomic methylation patterns for all 22 human autosomes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology*
  • DNA / chemistry*
  • DNA / metabolism
  • DNA Methylation*
  • Genome, Human*
  • Humans
  • Predictive Value of Tests

Substances

  • DNA