Negative Binomial Mixture Model for Identification of Noise in Antigen-Specificity Predictions by LIBRA-seq

Perry T Wasdin; Alexandra A Abu-Shmais; Michael W Irvin; Matthew J Vukovich; Ivelin S Georgiev

doi:10.1101/2023.10.13.562258

Negative Binomial Mixture Model for Identification of Noise in Antigen-Specificity Predictions by LIBRA-seq

bioRxiv [Preprint]. 2023 Oct 19:2023.10.13.562258. doi: 10.1101/2023.10.13.562258.

Authors

Perry T Wasdin^{1

2

3}, Alexandra A Abu-Shmais^{3

4}, Michael W Irvin⁵, Matthew J Vukovich^{3

4}, Ivelin S Georgiev^{1

2

3

4}

Affiliations

¹ Program in Chemical and Physical Biology, Vanderbilt University Medical Center; Nashville, TN, USA.
² Program in Computational Microbiology and Immunology, Vanderbilt University Medical Center; Nashville, TN, 37232, USA.
³ Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
⁴ Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
⁵ Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, USA.

Abstract

Motivation: LIBRA-seq (linking B cell receptor to antigen specificity by sequencing) provides a powerful tool for interrogating the antigen-specific B cell compartment and identifying antibodies against antigen targets of interest. Identification of noise in LIBRA-seq antigen count data is critical for improving antigen binding predictions for downstream applications including antibody discovery and machine learning technologies.

Results: In this study, we present a method for denoising LIBRA-seq data by clustering antigen counts into signal and noise components with a negative binomial mixture model. This approach leverages the VRC01 negative control cells included in a recent LIBRA-seq study(Abu-Shmais et al.) to provide a data-driven means for identification of technical noise. We apply this method to a dataset of nine donors representing separate LIBRA-seq experiments and show that our approach provides improved predictions for in vitro antibody-antigen binding when compared to the standard scoring method used in LIBRA-seq, despite variance in data size and noise structure across samples. This development will improve the ability of LIBRA-seq to identify antigen-specific B cells and contribute to providing more reliable datasets for future machine learning based approaches to predicting antibody-antigen binding as the corpus of LIBRA-seq data continues to grow.

Publication types

Preprint

Abstract

Publication types

Grants and funding