Simcryocluster: a semantic similarity clustering method of cryo-EM images by adopting contrastive learning

BMC Bioinformatics. 2024 Feb 20;25(1):77. doi: 10.1186/s12859-023-05565-w.

Abstract

Background: Cryo-electron microscopy (Cryo-EM) plays an increasingly important role in the determination of the three-dimensional (3D) structure of macromolecules. In order to achieve 3D reconstruction results close to atomic resolution, 2D single-particle image classification is not only conducive to single-particle selection, but also a key step that affects 3D reconstruction. The main task is to cluster and align 2D single-grain images into non-heterogeneous groups to obtain sharper single-grain images by averaging calculations. The main difficulties are that the cryo-EM single-particle image has a low signal-to-noise ratio (SNR), cannot manually label the data, and the projection direction is random and the distribution is unknown. Therefore, in the low SNR scenario, how to obtain the characteristic information of the effective particles, improve the clustering accuracy, and thus improve the reconstruction accuracy, is a key problem in the 2D image analysis of single particles of cryo-EM.

Results: Aiming at the above problems, we propose a learnable deep clustering method and a fast alignment weighted averaging method based on frequency domain space to effectively improve the class averaging results and improve the reconstruction accuracy. In particular, it is very prominent in the feature extraction and dimensionality reduction module. Compared with the classification method based on Bayesian and great likelihood, a large amount of single particle data is required to estimate the relative angle orientation of macromolecular single particles in the 3D structure, and we propose that the clustering method shows good results.

Conclusions: SimcryoCluster can use the contrastive learning method to perform well in the unlabeled high-noise cryo-EM single particle image classification task, making it an important tool for cryo-EM protein structure determination.

Keywords: 2D classification; Contrastive learning; Cryo-EM; Protein structure determination.

MeSH terms

  • Bayes Theorem
  • Cluster Analysis
  • Cryoelectron Microscopy / methods
  • Image Processing, Computer-Assisted* / methods
  • Macromolecular Substances
  • Semantics*

Substances

  • Macromolecular Substances