An integrative computational model for large-scale identification of metalloproteins in microbial genomes: a focus on iron-sulfur cluster proteins

Metallomics. 2014 Oct;6(10):1913-30. doi: 10.1039/c4mt00156g. Epub 2014 Aug 13.

Abstract

Metalloproteins represent a ubiquitous group of molecules which are crucial to the survival of all living organisms. While several metal-binding motifs have been defined, it remains challenging to confidently identify metalloproteins from primary protein sequences using computational approaches alone. Here, we describe a comprehensive strategy based on a machine learning approach to design and assess a penalized generalized linear model. We used this strategy to detect members of the iron-sulfur cluster protein family. A new category of descriptors, whose profile is based on profile hidden Markov models, encoding structural information was combined with public descriptors into a linear model. The model was trained and tested on distinct datasets composed of well-characterized iron-sulfur protein sequences, and the resulting model provided higher sensitivity compared to a motif-based approach, while maintaining a good level of specificity. Analysis of this linear model allows us to detect and quantify the contribution of each descriptor, providing us with a better understanding of this complex protein family along with valuable indications for further experimental characterization. Two newly-identified proteins, YhcC and YdiJ, were functionally validated as genuine iron-sulfur proteins, confirming the prediction. The computational model was then applied to over 550 prokaryotic genomes to screen for iron-sulfur proteomes; the results are publicly available at: . This study represents a proof-of-concept for the application of a penalized linear model to identify metalloprotein superfamilies on a large-scale. The application employed here, screening for iron-sulfur proteomes, provides new candidates for further biochemical and structural analysis as well as new resources for an extensive exploration of iron-sulfuromes in the microbial world.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Computer Simulation
  • Databases, Protein
  • Genome, Microbial*
  • Iron-Sulfur Proteins / chemistry
  • Iron-Sulfur Proteins / genetics
  • Iron-Sulfur Proteins / metabolism*
  • Markov Chains
  • Models, Biological

Substances

  • Iron-Sulfur Proteins