DPAM-AI: a domain parser for AlphaFold models powered by artificial intelligence

Bioinformatics. 2024 Dec 26;41(1):btae740. doi: 10.1093/bioinformatics/btae740.

Abstract

Motivation: Due to the breakthrough in protein structure prediction by AlphaFold, the scientific community has access to 200 million predicted protein structures with near-atomic accuracy from the AlphaFold protein structure DataBase (AFDB), covering nearly the entire protein universe. Segmenting these models into domains and classifying them into an evolutionary hierarchy hold tremendous potential for unraveling essential insights into protein function.

Results: We introduce DPAM-AI, a Domain Parser for AlphaFold Models based on Artificial Intelligence. DPAM-AI utilizes a convolutional neural network trained with previously classified domains in the Evolutionary Classification Of protein Domains (ECOD) database. DPAM-AI integrates inter-residue distances, predicted aligned errors, and sequence and structural alignments to previously classified domains detected via sequence (HHsuite) and structural (Dali) similarity searches. DPAM-AI has demonstrated its power through rigorous tests, excelling in several benchmark sets compared to its predecessor, DPAM, and other recently published domain parsers, Merizo and Chainsaw. We applied DPAM-AI to representative AFDB models for proteins classified in Pfam. We obtained representative 3D structures for 18 487 (89%) of the 20 795 Pfam families. The remaining families either (i) belong to viral proteins that were excluded from AFDB or (ii) do not adopt globular 3D structures. Our structure-aware domain delineation uncovered a considerable fraction (15%) of Pfam domains containing multiple structural and evolutionary units and refined the boundaries for over half.

Availability and implementation: Pfam and corresponding DPAM-AI domains are at http://prodata.swmed.edu/DPAM-pfam/. Our code is deposited at https://github.com/Jsauce5p/DPAM/tree/dpam_ai, and updates will be released through https://github.com/CongLabCode/DPAM.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Computational Biology / methods
  • Databases, Protein*
  • Neural Networks, Computer
  • Protein Domains
  • Protein Folding
  • Proteins* / chemistry
  • Software

Substances

  • Proteins