Systematic benchmark of single-cell hashtag demultiplexing approaches reveals robust performance of a clustering-based method

Mohammed Sayed; Yue Julia Wang; Hee-Woong Lim

doi:10.1093/bfgp/elae039

Systematic benchmark of single-cell hashtag demultiplexing approaches reveals robust performance of a clustering-based method

Brief Funct Genomics. 2024 Oct 10:elae039. doi: 10.1093/bfgp/elae039. Online ahead of print.

Authors

Mohammed Sayed¹, Yue Julia Wang², Hee-Woong Lim^{1

3}

Affiliations

¹ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave. Cincinnati OH 45229, United States.
² Department of Biomedical Sciences, College of Medicine, Florida State University, 1115 W Call St, Tallahassee, FL 32306, United States.
³ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, 3333 Burnet Ave. Cincinnati OH 45229, United States.

PMID: 39387404
DOI: 10.1093/bfgp/elae039

Abstract

Single-cell technology opened up a new avenue to delineate cellular status at a single-cell resolution and has become an essential tool for studying human diseases. Multiplexing allows cost-effective experiments by combining multiple samples and effectively mitigates batch effects. It starts by giving each sample a unique tag and then pooling them together for library preparation and sequencing. After sequencing, sample demultiplexing is performed based on tag detection, where cells belonging to one sample are expected to have a higher amount of the corresponding tag than cells from other samples. However, in reality, demultiplexing is not straightforward due to the noise and contamination from various sources. Successful demultiplexing depends on the efficient removal of such contamination. Here, we perform a systematic benchmark combining different normalization methods and demultiplexing approaches using real-world data and simulated datasets. We show that accounting for sequencing depth variability increases the separability between tagged and untagged cells, and the clustering-based approach outperforms existing tools. The clustering-based workflow is available as an R package from https://github.com/hwlim/hashDemux.

Keywords: benchmark; clustering; demultiplex; hashtag; single-cell data.

Abstract

Grants and funding