Background: Index hopping causes read assignment errors in data from multiplexed sequencing libraries. This issue has become more prevalent with the widespread use of high-capacity sequencers and highly multiplexed single-cell RNA sequencing (scRNA- seq).
Results: We conducted deep, plate-based scRNA-seq on a mixed population of mouse skin cells. Analysis of transcriptomes from 1152 cells identified four distinct cell types. To estimate the error rate in sample assignment due to index hopping, we employed differential expression analysis to identify signature genes that were highly and specifically expressed in each cell type. We quantified the proportion of misassigned reads by examining the detection rates of signature genes in other cell types. Remarkably, regardless of gene expression levels, we estimated that 0.65% of reads per gene were assigned to incorrect cell across our data. To computationally compensate for index hopping, we developed a simple correction method wherein, for each gene, 0.65% of the library's average expression level was subtracted from the expression in each cell. This correction had notable effects on transcriptome analyses, including increased cell-cell clustering distance and alterations in intermediate state assignments of cell differentiation.
Conclusions: Index hopping misassignments are measurable and can impact the experimental interpretation of sequencing results. We devised a straightforward method to estimate and correct for the index hopping rate by quantifying misassigned genes in distinct cell types within an scRNA-seq library. This approach can be applied to any barcoded, multiplexed scRNA-seq library containing cells with distinct expression profiles, allowing for correction of the expression matrix before conducting biological analysis.