Single-cell RNA sequencing (scRNA-Seq) studies have provided critical insight into the pathogenesis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). scRNA-Seq library preparation methods and data processing workflows are generally designed for the detection and quantification of eukaryotic host mRNAs and not viral RNAs. Here, we compare different scRNA-Seq library preparation methods for their ability to quantify and detect SARS-CoV-2 RNAs with a focus on subgenomic mRNAs (sgmRNAs). We show that compared to 10X Genomics Chromium Next GEM Single Cell 3' (10X 3') libraries or 10X Genomics Chromium Next GEM Single Cell V(D)J (10X 5') libraries sequenced with standard read configurations, 10X 5' libraries sequenced with an extended length read 1 (R1) that covers both cell barcode and transcript sequence (termed "10X 5' with extended R1") increase the number of unambiguous reads spanning leader-sgmRNA junction sites. We further present a data processing workflow, single-cell coronavirus sequencing (scCoVseq), which quantifies reads unambiguously assigned to viral sgmRNAs or viral genomic RNA (gRNA). We find that combining 10X 5' with extended R1 library preparation/sequencing and scCoVseq data processing maximizes the number of viral UMIs per cell quantified by scRNA-Seq. Corresponding sgmRNA expression levels are highly correlated with expression in matched bulk RNA-Seq data sets quantified with established tools for SARS-CoV-2 analysis. Using this scRNA-Seq approach, we find that SARS-CoV-2 gene expression is highly correlated across individual infected cells, which suggests that the proportion of viral sgmRNAs remains generally consistent throughout infection. Taken together, these results and corresponding data processing workflow enable robust quantification of coronavirus sgmRNA expression at single-cell resolution, thereby supporting high-resolution studies of viral RNA processes in individual cells. IMPORTANCE Single-cell RNA sequencing (scRNA-Seq) has emerged as a valuable tool to study host-virus interactions, especially for coronavirus disease 2019 (COVID-19). Here we compare the performance of different scRNA-Seq library preparation methods and sequencing strategies to detect SARS-CoV-2 RNAs and develop a data processing workflow to quantify unambiguous sequence reads derived from SARS-CoV-2 genomic RNA and subgenomic mRNAs. After establishing a workflow that maximizes the detection of SARS-CoV-2 subgenomic mRNAs, we explore patterns of SARS-CoV-2 gene expression across cells with variable levels of total viral RNA, assess host gene expression differences between infected and bystander cells, and identify non-canonical and lowly abundant SARS-CoV-2 RNAs. The sequencing and data processing strategies developed here can enhance studies of coronavirus RNA biology at single-cell resolution and thereby contribute to our understanding of viral pathogenesis.
Keywords: coronavirus; molecular methods; single-cell RNA-seq; transcriptomics; viral pathogenesis; virology; virus-host interactions.