CW-PRED: Prediction of C-terminal surface anchoring sorting signals in bacteria and Archaea

J Bioinform Comput Biol. 2024 Aug;22(4):2450021. doi: 10.1142/S0219720024500215. Epub 2024 Aug 31.

Abstract

Sorting signals are crucial for the anchoring of proteins to the cell surface in archaea and bacteria. These proteins often feature distinct motifs at their C-terminus, cleaved by sortase or sortase-like enzymes. Gram-positive bacteria exhibit the LPXTGX consensus motif, cleaved by sortases, while Gram-negative bacteria employ exosortases recognizing motifs like PEP. Archaea utilize exosortase homologs known as archaeosortases for signal anchoring. Traditionally identification of such C-terminal sorting signals was performed with profile Hidden Markov Models (pHMMs). The Cell-Wall PREDiction (CW-PRED) method introduced for the first time a custom-made class HMM for proteins in Gram-positive bacteria that contain a cell wall sorting signal which begins with an LPXTG motif, followed by a hydrophobic domain and a tail of positively charged residues. Here we present a new and updated version of CW-PRED for predicting C-terminal sorting signals in Archaea, Gram-positive, and Gram-negative bacteria. We used a large training set and several model enhancements that improve motif identification in order to achieve better discrimination between C-terminal signals and other proteins. Cross-validation demonstrates CW-PRED's superiority in sensitivity and specificity compared to other methods. Application of the method in reference proteomes reveals a large number of potential surface proteins not previously identified. The method is available for academic use at http://195.251.108.230/apps.compgen.org/CW-PRED/ and as standalone software.

Keywords: Archaea; C-terminal surface-anchoring sorting signals; Gram-negative bacteria; Gram-positive bacteria; Hidden Markov Model; sequence analysis; sortases.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Archaea / genetics
  • Archaea / metabolism
  • Archaeal Proteins* / chemistry
  • Archaeal Proteins* / genetics
  • Archaeal Proteins* / metabolism
  • Bacteria / genetics
  • Bacteria / metabolism
  • Bacterial Proteins* / chemistry
  • Bacterial Proteins* / genetics
  • Bacterial Proteins* / metabolism
  • Cell Wall / chemistry
  • Cell Wall / metabolism
  • Computational Biology / methods
  • Markov Chains
  • Protein Sorting Signals*
  • Software

Substances

  • Protein Sorting Signals
  • Bacterial Proteins
  • Archaeal Proteins