Handling multi-mapped reads in RNA-seq

Gabrielle Deschamps-Francoeur; Joël Simoneau; Michelle S Scott

doi:10.1016/j.csbj.2020.06.014

Handling multi-mapped reads in RNA-seq

Comput Struct Biotechnol J. 2020 Jun 12:18:1569-1576. doi: 10.1016/j.csbj.2020.06.014. eCollection 2020.

Authors

Gabrielle Deschamps-Francoeur¹, Joël Simoneau¹, Michelle S Scott¹

Affiliation

¹ Département de Biochimie et Génomique Fonctionnelle, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada.

Abstract

Many eukaryotic genomes harbour large numbers of duplicated sequences, of diverse biotypes, resulting from several mechanisms including recombination, whole genome duplication and retro-transposition. Such repeated sequences complicate gene/transcript quantification during RNA-seq analysis due to reads mapping to more than one locus, sometimes involving genes embedded in other genes. Genes of different biotypes have dissimilar levels of sequence duplication, with long-noncoding RNAs and messenger RNAs sharing less sequence similarity to other genes than biotypes encoding shorter RNAs. Many strategies have been elaborated to handle these multi-mapped reads, resulting in increased accuracy in gene/transcript quantification, although separate tools are typically used to estimate the abundance of short and long genes due to their dissimilar characteristics. This review discusses the mechanisms leading to sequence duplication, the biotypes affected, the computational strategies employed to deal with multi-mapped reads and the challenges that still remain to be overcome.

Keywords: Duplicated genes; Expectation–maximization algorithm; Gene isoforms; Multi-mapped reads; Noncoding RNAs; RNA-seq.

Publication types

Review