SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses

Nicholas J Eagles; Emily E Burke; Jacob Leonard; Brianna K Barry; Joshua M Stolz; Louise Huuki; BaDoi N Phan; Violeta Larios Serrato; Everardo Gutiérrez-Millán; Israel Aguilar-Ordoñez; Andrew E Jaffe; Leonardo Collado-Torres

doi:10.1186/s12859-021-04142-3

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses

BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.

Authors

Nicholas J Eagles¹, Emily E Burke¹, Jacob Leonard^{2

3}, Brianna K Barry^{1

4}, Joshua M Stolz¹, Louise Huuki¹, BaDoi N Phan^{1

5

6}, Violeta Larios Serrato^{2

7}, Everardo Gutiérrez-Millán², Israel Aguilar-Ordoñez^{2

8}, Andrew E Jaffe^{1

4

9

10

11

12

13}, Leonardo Collado-Torres^{14

15}

Affiliations

¹ Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
² Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico.
³ QuestBridge Scholar, Palo Alto, CA, 94303, USA.
⁴ Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
⁵ Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
⁶ Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
⁷ Instituto Politécnico Nacional, Escuela Nacional de Ciencias Biológicas, Mexico City, CDMX, 11340, Mexico.
⁸ Department of Supercomputing, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, CDMX, 14610, Mexico.
⁹ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA.
¹⁰ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.
¹¹ Department of Genetic Medicine, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
¹² Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
¹³ Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.
¹⁴ Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA. [email protected].
¹⁵ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA. [email protected].

Abstract

Background: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step-such as alignment of reads to a reference genome-of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses.

Results: In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided ( http://research.libd.org/SPEAQeasy/ ).

Conclusions: SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.

Keywords: Bioconductor; Pipeline; RNA-seq.

MeSH terms

High-Throughput Nucleotide Sequencing*
RNA-Seq
Sequence Analysis, RNA
Software*
Workflow

Abstract

MeSH terms

Grants and funding