RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations

Steven C Munger; Narayanan Raghupathy; Kwangbom Choi; Allen K Simons; Daniel M Gatti; Douglas A Hinerfeld; Karen L Svenson; Mark P Keller; Alan D Attie; Matthew A Hibbs; Joel H Graber; Elissa J Chesler; Gary A Churchill

doi:10.1534/genetics.114.165886

RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations

Genetics. 2014 Sep;198(1):59-73. doi: 10.1534/genetics.114.165886.

Affiliations

¹ The Jackson Laboratory, Bar Harbor, Maine 04609.
² University of Wisconsin, Madison, Wisconsin 53705.
³ The Jackson Laboratory, Bar Harbor, Maine 04609 Trinity University, San Antonio, Texas 78212.
⁴ The Jackson Laboratory, Bar Harbor, Maine 04609 [email protected].

Abstract

Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.

Keywords: Diversity Outbred (DO); Diversity Outbred mice; MPP; Multiparent Advanced Generation Inter-Cross (MAGIC); Multiparental populations; QTL mapping; RNA-seq; expression QTL; haplotype reconstruction; high-density genotyping; mixed models.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Animals
Female
Genome
Male
Mice
Quantitative Trait Loci
Sequence Alignment / methods*
Sequence Analysis, RNA / methods*
Software*
Transcriptome*

RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding