Prediction of Protein Secondary Structures Based on Substructural Descriptors of Molecular Fragments

Int J Mol Sci. 2024 Nov 21;25(23):12525. doi: 10.3390/ijms252312525.

Abstract

The accurate prediction of secondary structures of proteins (SSPs) is a critical challenge in molecular biology and structural bioinformatics. Despite recent advancements, this task remains complex and demands further exploration. This study presents a novel approach to SSP prediction using atom-centric substructural multilevel neighborhoods of atoms (MNA) descriptors for protein molecular fragments. A dataset comprising over 335,000 SSPs, annotated by the Dictionary of Secondary Structure in Proteins (DSSP) software from 37,000 proteins, was constructed from Protein Data Bank (PDB) records with a resolution of 2 Å or better. Protein fragments were converted into structural formulae using the RDKit Python package and stored in SD files using the MOL V3000 format. Classification sequence-structure-property relationships (SSPR) models were developed with varying levels of MNA descriptors and a Bayesian algorithm implemented in MultiPASS software. The average prediction accuracy (AUC) for eight SSP types, calculated via leave-one-out cross-validation, was 0.902. For independent test sets (ASTRAL and CB513 datasets), the best SSPR models achieved AUC, Q3, and Q8 values of 0.860, 77.32%, 70.92% and 0.889, 78.78%, 74.74%, respectively. Based on the created models, a freely available web application MNA-PSS-Pred was developed.

Keywords: MNA descriptors; MultiPASS; prediction of secondary structures of protein; sequence–structure–property relationships.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Computational Biology / methods
  • Databases, Protein*
  • Models, Molecular
  • Protein Structure, Secondary*
  • Proteins* / chemistry
  • Software*

Substances

  • Proteins