Identification of genes within CpG-enriched DNA from human chromosome 4p16.3

Hum Mol Genet. 1994 Sep;3(9):1611-6. doi: 10.1093/hmg/3.9.1611.

Abstract

We combined the isolation of gene-enriched genomic DNA with gene prediction by computer to search for genes in a cosmid contig covering one million base pairs in the Huntington disease region on chromosome 4. Our aim was to develop a simple, robust strategy to identify genes adjacent to CpG islands without first characterizing undermethylated regions with multiple rare-cutter restriction enzyme sites. We cloned DNA adjacent to the rare-cutter restriction enzyme sites EagI and SacII, which are predicted to cut more frequently within CpG islands and relied solely on minimal sequence analysis to determine the likely coding potential of the DNA next to these sites. Our results indicated that isolating fragments with a single rare-cutter restriction enzyme site was sufficient to provide a high likelihood of identifying genes. Of the 42 CpG-selected clones analyzed, we determined that 17 contained exons as determined by sequence identity to known genes in this region, sequence identity to gene fragments isolated by direct cDNA selection in our laboratory, and/or their ability to detect transcripts on Northern blots. Analysis of the sequences with the BLAST and GRAIL programs provided additional independent evidence that 15 of these 17 clones contain coding sequences and that nine other clones are likely to contain sequences coding for portions of new genes. By mapping these clones to an EcoRI restriction map of the region, we determined a detailed localization for each of the exons and estimate that there are a minimum of seven genes that contain CpG-rich DNA between D4S126 and D4S181.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Chromosome Mapping*
  • Chromosomes, Human, Pair 4*
  • Cloning, Molecular
  • Cosmids
  • DNA / genetics*
  • Exons
  • Genetic Markers
  • Humans
  • Molecular Sequence Data
  • Oligodeoxyribonucleotides / genetics*
  • RNA / genetics
  • Repetitive Sequences, Nucleic Acid
  • Software

Substances

  • Genetic Markers
  • Oligodeoxyribonucleotides
  • RNA
  • DNA

Associated data

  • GENBANK/L33986
  • GENBANK/L33987
  • GENBANK/L33988
  • GENBANK/L33989
  • GENBANK/L33990
  • GENBANK/L33991
  • GENBANK/L33992
  • GENBANK/L33993
  • GENBANK/L33994
  • GENBANK/L33995
  • GENBANK/L33996
  • GENBANK/L33997
  • GENBANK/L33998
  • GENBANK/L33999
  • GENBANK/L34000
  • GENBANK/L34001
  • GENBANK/L34002
  • GENBANK/L34003
  • GENBANK/L34004
  • GENBANK/L34005
  • GENBANK/L34007