RNA-seq data science: From raw data to effective interpretation

Dhrithi Deshpande; Karishma Chhugani; Yutong Chang; Aaron Karlsberg; Caitlin Loeffler; Jinyang Zhang; Agata Muszyńska; Viorel Munteanu; Harry Yang; Jeremy Rotman; Laura Tao; Brunilda Balliu; Elizabeth Tseng; Eleazar Eskin; Fangqing Zhao; Pejman Mohammadi; Paweł P Łabaj; Serghei Mangul

doi:10.3389/fgene.2023.997383

RNA-seq data science: From raw data to effective interpretation

Front Genet. 2023 Mar 13:14:997383. doi: 10.3389/fgene.2023.997383. eCollection 2023.

Authors

Dhrithi Deshpande¹, Karishma Chhugani¹, Yutong Chang¹, Aaron Karlsberg², Caitlin Loeffler³, Jinyang Zhang⁴, Agata Muszyńska^{5

6}, Viorel Munteanu⁷, Harry Yang⁸, Jeremy Rotman², Laura Tao⁹, Brunilda Balliu⁹, Elizabeth Tseng¹⁰, Eleazar Eskin^{3

9

11}, Fangqing Zhao^{4

12}, Pejman Mohammadi¹³, Paweł P Łabaj^{5

14}, Serghei Mangul^{2

15}

Affiliations

¹ Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States.
² Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States.
³ Department of Computer Science, University of California, Los Angeles, CA, United States.
⁴ Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China.
⁵ Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
⁶ Institute of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland.
⁷ Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, Moldova.
⁸ Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, United States.
⁹ Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States.
¹⁰ Pacific Biosciences, Menlo Park, CA, United States.
¹¹ Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States.
¹² Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China.
¹³ Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States.
¹⁴ Department of Biotechnology, Boku University Vienna, Vienna, Austria.
¹⁵ Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA, United States.

Abstract

RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.

Keywords: RNA sequencing; bioinformatics; differential gene expression; high throughput sequencing; read alignment; transcriptome quantification.

Publication types

Review

Grants and funding

R01 GM140287/GM/NIGMS NIH HHS/United States