Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Jonas A Sibbesen; Jordan M Eizenga; Adam M Novak; Jouni Sirén; Xian Chang; Erik Garrison; Benedict Paten

doi:10.1038/s41592-022-01731-9

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Nat Methods. 2023 Feb;20(2):239-247. doi: 10.1038/s41592-022-01731-9. Epub 2023 Jan 16.

Authors

Jonas A Sibbesen^#¹, Jordan M Eizenga^#¹, Adam M Novak¹, Jouni Sirén¹, Xian Chang¹, Erik Garrison², Benedict Paten³

Affiliations

¹ UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
² University of Tennessee Health Science Center, Memphis, TN, USA.
³ UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA. [email protected].

^# Contributed equally.

PMID: 36646895
DOI: 10.1038/s41592-022-01731-9

Abstract

Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology*
Gene Expression Profiling*
Haplotypes
Metagenomics
Transcriptome

Abstract

Publication types

MeSH terms

Grants and funding