Allelome.PRO, a pipeline to define allele-specific genomic features from high-throughput sequencing data

Nucleic Acids Res. 2015 Dec 2;43(21):e146. doi: 10.1093/nar/gkv727. Epub 2015 Jul 21.

Abstract

Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the 'allelome' by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Alleles*
  • Animals
  • Cells, Cultured
  • Chromatin Immunoprecipitation
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Histone Code
  • Mice
  • Mice, Inbred Strains
  • Sequence Analysis, DNA
  • Sequence Analysis, RNA
  • Software*