Convolutional Neural Networks Based on Sequential Spike Predict the High Human Adaptation of SARS-CoV-2 Omicron Variants

Viruses. 2022 May 17;14(5):1072. doi: 10.3390/v14051072.

Abstract

The COVID-19 pandemic has frequently produced more highly transmissible SARS-CoV-2 variants, such as Omicron, which has produced sublineages. It is a challenge to tell apart high-risk Omicron sublineages and other lineages of SARS-CoV-2 variants. We aimed to build a fine-grained deep learning (DL) model to assess SARS-CoV-2 transmissibility, updating our former coarse-grained model, with the training/validating data of early-stage SARS-CoV-2 variants and based on sequential Spike samples. Sequential amino acid (AA) frequency was decomposed into serially and slidingly windowed fragments in Spike. Unsupervised machine learning approaches were performed to observe the distribution in sequential AA frequency and then a supervised Convolutional Neural Network (CNN) was built with three adaptation labels to predict the human adaptation of Omicron variants in sublineages. Results indicated clear inter-lineage separation and intra-lineage clustering for SARS-CoV-2 variants in the decomposed sequential AAs. Accurate classification by the predictor was validated for the variants with different adaptations. Higher adaptation for the BA.2 sublineage and middle-level adaptation for the BA.1/BA.1.1 sublineages were predicted for Omicron variants. Summarily, the Omicron BA.2 sublineage is more adaptive than BA.1/BA.1.1 and has spread more rapidly, particularly in Europe. The fine-grained adaptation DL model works well for the timely assessment of the transmissibility of SARS-CoV-2 variants, facilitating the control of emerging SARS-CoV-2 variants.

Keywords: Omicron; SARS-CoV-2; adaptation; deep learning; sequential amino acid frequency.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Humans
  • Neural Networks, Computer
  • Pandemics
  • SARS-CoV-2* / genetics
  • Spike Glycoprotein, Coronavirus / genetics

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2

Supplementary concepts

  • SARS-CoV-2 variants

Grants and funding

This research was funded by the National Key Research and Development Program of China, grant number 2021YFC2302004, the National Natural Science Foundation of China (grant no. 32070166) and the Capital’s Funds for Health Improvement and Research, grant number 2021-1G-4302.