A machine learning-based approach to determine infection status in recipients of BBV152 (Covaxin) whole-virion inactivated SARS-CoV-2 vaccine for serological surveys

Comput Biol Med. 2022 Jul:146:105419. doi: 10.1016/j.compbiomed.2022.105419. Epub 2022 Apr 25.

Abstract

Data science has been an invaluable part of the COVID-19 pandemic response with multiple applications, ranging from tracking viral evolution to understanding the vaccine effectiveness. Asymptomatic breakthrough infections have been a major problem in assessing vaccine effectiveness in populations globally. Serological discrimination of vaccine response from infection has so far been limited to Spike protein vaccines since whole virion vaccines generate antibodies against all the viral proteins. Here, we show how a statistical and machine learning (ML) based approach can be used to discriminate between SARS-CoV-2 infection and immune response to an inactivated whole virion vaccine (BBV152, Covaxin). For this, we assessed serial data on antibodies against Spike and Nucleocapsid antigens, along with age, sex, number of doses taken, and days since last dose, for 1823 Covaxin recipients. An ensemble ML model, incorporating a consensus clustering approach alongside the support vector machine model, was built on 1063 samples where reliable qualifying data existed, and then applied to the entire dataset. Of 1448 self-reported negative subjects, our ensemble ML model classified 724 to be infected. For method validation, we determined the relative ability of a random subset of samples to neutralize Delta versus wild-type strain using a surrogate neutralization assay. We worked on the premise that antibodies generated by a whole virion vaccine would neutralize wild type more efficiently than delta strain. In 100 of 156 samples, where ML prediction differed from self-reported uninfected status, neutralization against Delta strain was more effective, indicating infection. We found 71.8% subjects predicted to be infected during the surge, which is concordant with the percentage of sequences classified as Delta (75.6%-80.2%) over the same period. Our approach will help in real-world vaccine effectiveness assessments where whole virion vaccines are commonly used.

Keywords: BBV152; COVID-19; Covaxin; Ensemble methods; Infection; Machine learning; SARS-CoV-2.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 Vaccines / therapeutic use
  • COVID-19* / epidemiology
  • COVID-19* / prevention & control
  • Humans
  • Machine Learning
  • Pandemics
  • SARS-CoV-2
  • Vaccines, Inactivated
  • Viral Vaccines*
  • Virion

Substances

  • COVID-19 Vaccines
  • Vaccines, Inactivated
  • Viral Vaccines