Benchmarking digital PCR partition classification methods with empirical and simulated duplex data

Yao Chen; Ward De Spiegelaere; Wim Trypsteen; David Gleerup; Jo Vandesompele; Antoon Lievens; Matthijs Vynck; Olivier Thas

doi:10.1093/bib/bbae120

Benchmarking digital PCR partition classification methods with empirical and simulated duplex data

Brief Bioinform. 2024 Mar 27;25(3):bbae120. doi: 10.1093/bib/bbae120.

Authors

Yao Chen^{1

2

3}, Ward De Spiegelaere^{2

3}, Wim Trypsteen^{2

3

4}, David Gleerup^{2

3}, Jo Vandesompele^{3

5

6

7}, Antoon Lievens³, Matthijs Vynck^{2

3}, Olivier Thas^{1

3

8

9}

Affiliations

¹ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium.
² Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium.
³ Ghent University Digital PCR Consortium, Ghent University, Belgium.
⁴ Department of Internal Medicine, Ghent University and University Hospital, Belgium.
⁵ Department of Biomolecular Medicine, Ghent University and University Hospital, Belgium.
⁶ Cancer Research Institute Ghent (CRIG), Ghent University and University Hospital, Belgium.
⁷ Pxlence, Belgium.
⁸ I-BioStat, Data Science Institute, Hasselt University, Belgium.
⁹ National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Australia.

Abstract

Digital PCR (dPCR) is a highly accurate technique for the quantification of target nucleic acid(s). It has shown great potential in clinical applications, like tumor liquid biopsy and validation of biomarkers. Accurate classification of partitions based on end-point fluorescence intensities is crucial to avoid biased estimators of the concentration of the target molecules. We have evaluated many clustering methods, from general-purpose methods to specific methods for dPCR and flowcytometry, on both simulated and real-life data. Clustering method performance was evaluated by simulating various scenarios. Based on our extensive comparison of clustering methods, we describe the limits of these methods, and formulate guidelines for choosing an appropriate method. In addition, we have developed a novel method for simulating realistic dPCR data. The method is based on a mixture distribution of a Poisson point process and a skew-$t$ distribution, which enables the generation of irregularities of cluster shapes and randomness of partitions between clusters ('rain') as commonly observed in dPCR data. Users can fine-tune the model parameters and generate labeled datasets, using their own data as a template. Besides, the database of experimental dPCR data augmented with the labeled simulated data can serve as training and testing data for new clustering methods. The simulation method is available as an R Shiny app.

Keywords: absolute quantification; clustering; digital PCR; high-precision PCR; molecular diagnostics; nucleic acid amplification; nucleic acid quantification; simulation.

MeSH terms

Benchmarking
Humans
Liquid Biopsy
Neoplasms*
Nucleic Acids*
Polymerase Chain Reaction / methods

Substances

Nucleic Acids

Abstract

MeSH terms

Substances

Grants and funding