Machine Learning Approach Effectively Predicts Binding Between SARS-CoV-2 Spike and ACE2 Across Mammalian Species - Worldwide, 2021

China CDC Wkly. 2021 Nov 12;3(46):967-972. doi: 10.46234/ccdcw2021.235.

Abstract

Introduction: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a recently emergent coronavirus of natural origin and caused the coronavirus disease (COVID-19) pandemic. The study of its natural origin and host range is of particular importance for source tracing, monitoring of this virus, and prevention of recurrent infections. One major approach is to test the binding ability of the viral receptor gene ACE2 from various hosts to SARS-CoV-2 spike protein, but it is time-consuming and labor-intensive to cover a large collection of species.

Methods: In this paper, we applied state-of-the-art machine learning approaches and created a pipeline reaching >87% accuracy in predicting binding between different ACE2 and SARS-CoV-2 spike.

Results: We further validated our prediction pipeline using 2 independent test sets involving >50 bat species and achieved >78% accuracy. A large-scale screening of 204 mammal species revealed 144 species (or 61%) were susceptible to SARS-CoV-2 infections, highlighting the importance of intensive monitoring and studies in mammalian species.

Discussion: In short, our study employed machine learning models to create an important tool for predicting potential hosts of SARS-CoV-2 and achieved the highest precision to our knowledge in experimental validation. This study also predicted that a wide range of mammals were capable of being infected by SARS-CoV-2.

Keywords: ACE2; SARS-CoV-2; machine learning.

Grants and funding

The Strategic Priority Research Programs of the Chinese Academy of Sciences (XDB29020000), the National Natural Science Foundation of China (32041009) and Key R&D Program of Shandong Province (2020CXGC011305)