Important biological information uncovered in previously unaligned reads from chromatin immunoprecipitation experiments (ChIP-Seq)

Sci Rep. 2015 Mar 2:5:8635. doi: 10.1038/srep08635.

Abstract

Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs). ChIP-Seq furnishes millions of short reads that, after alignment, describe the genome-wide binding sites of a particular TF. However, in all organisms investigated an average of 40% of reads fail to align to the corresponding genome, with some datasets having as much as 80% of reads failing to align. We describe here the provenance of previously unaligned reads in ChIP-Seq experiments from animals and plants. We show that a substantial portion corresponds to sequences of bacterial and metazoan origin, irrespective of the ChIP-Seq chromatin source. Unforeseen was the finding that 30%-40% of unaligned reads were actually alignable. To validate these observations, we investigated the characteristics of the previously unaligned reads corresponding to TAL1, a human TF involved in lineage specification of hemopoietic cells. We show that, while unmapped ChIP-Seq read datasets contain foreign DNA sequences, additional TFBSs can be identified from the previously unaligned ChIP-Seq reads. Our results indicate that the re-evaluation of previously unaligned reads from ChIP-Seq experiments will significantly contribute to TF target identification and determination of emerging properties of GRNs.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Composition / genetics
  • Base Sequence
  • Basic Helix-Loop-Helix Transcription Factors / genetics
  • Chromatin Immunoprecipitation / methods*
  • Chromosomes, Human / genetics
  • Humans
  • Protein Binding
  • Proto-Oncogene Proteins / genetics
  • Reproducibility of Results
  • Sequence Alignment*
  • Sequence Analysis, DNA*
  • T-Cell Acute Lymphocytic Leukemia Protein 1

Substances

  • Basic Helix-Loop-Helix Transcription Factors
  • Proto-Oncogene Proteins
  • T-Cell Acute Lymphocytic Leukemia Protein 1
  • TAL1 protein, human