CpG_MI: a novel approach for identifying functional CpG islands in mammalian genomes

Nucleic Acids Res. 2010 Jan;38(1):e6. doi: 10.1093/nar/gkp882. Epub 2009 Oct 23.

Abstract

CpG islands (CGIs) are CpG-rich regions compared to CpG-depleted bulk DNA of mammalian genomes and are generally regarded as the epigenetic regulatory regions in association with unmethylation, promoter activity and histone modifications. Accurate identification of CpG islands with epigenetic regulatory function in bulk genomes is of wide interest. Here, the common features of functional CGIs are identified using an average mutual information method to differentiate functional CGIs from the remaining CGIs. A new approach (CpG mutual information, CpG_MI) was further explored to identify functional CGIs based on the cumulative mutual information of physical distances between two neighboring CpGs. Compared to current approaches, CpG_MI achieved the highest prediction accuracy. This approach also identified new functional CGIs overlapping with gene promoter regions which were missed by other algorithms. Nearly all CGIs identified by CpG_MI overlapped with histone modification marks. CpG_MI could also be used to identify potential functional CGIs in other mammalian genomes, as the CpG dinucleotide contents and cumulative mutual information distributions are almost the same among six mammalian genomes in our analysis. It is a reliable quantitative tool for the identification of functional CGIs from bulk genomes and helps in understanding the relationships between genomic functional elements and epigenomic modifications.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cattle
  • CpG Islands*
  • Dogs
  • Genome, Human
  • Genomics / methods*
  • Histones / metabolism
  • Humans
  • Mice
  • Rats

Substances

  • Histones