Comparing computational methods for identification of allele-specific expression based on next generation sequencing data

Genet Epidemiol. 2014 Nov;38(7):591-8. doi: 10.1002/gepi.21846. Epub 2014 Sep 2.

Abstract

Allele-specific expression (ASE) studies have wide-ranging implications for genome biology and medicine. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying ASE, but suffers from mapping bias favoring reference alleles. Two categories of methods are adopted nowadays, to reduce the effect of mapping bias on ASE identification-normalizing RNA allelic ratio with the parallel genomic allelic ratio (pDNAar) and modifying reference genome to make reads carrying both alleles with the same chance to be mapped (mREF). We compared the sensitivity and specificity of both methods with simulated data, and demonstrated that the pDNAar, though ideally practical, was lower in sensitivity, because of its lower mapping rate of reads carrying nonreference (alternative) alleles, although mREF achieved higher sensitivity and specificity for its efficiency in mapping reads carrying both alleles. Application of these two methods in real sequencing data showed that mREF were able to identify more ASE loci because of its higher mapping efficiency, and able to correcting some seemly incorrect ASE loci identified by pDNAar due to the inefficiency in mapping reads carrying alternative alleles of pDNAar. Our study provides useful information for RNA sequencing data processing in the identification of ASE.

Keywords: RNA sequencing; allele-specific expression; next-generation sequencing.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Base Sequence
  • Chromosome Mapping
  • Computer Simulation
  • False Negative Reactions
  • Gene Expression Profiling / methods*
  • Genome
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA
  • Sequence Analysis, RNA*
  • Software
  • Transcriptome