Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes

PLoS One. 2013;8(2):e56632. doi: 10.1371/journal.pone.0056632. Epub 2013 Feb 20.

Abstract

Background: As one of the most important virulence factor types in gram-negative pathogenic bacteria, type-III effectors (TTEs) play a crucial role in pathogen-host interactions by directly influencing immune signaling pathways within host cells. Based on the hypothesis that type-III secretion signals may be comprised of some weakly conserved sequence motifs, here we used profile-based amino acid pair information to develop an accurate TTE predictor.

Results: For a TTE or non-TTE, we first used a hidden Markov model-based sequence searching method (i.e., HHblits) to detect its weakly homologous sequences and extracted the profile-based k-spaced amino acid pair composition (HH-CKSAAP) from the N-terminal sequences. In the next step, the feature vector HH-CKSAAP was used to train a linear support vector machine model, which we designate as BEAN (Bacterial Effector ANalyzer). We compared our method with four existing TTE predictors through an independent test set, and our method revealed improved performance. Furthermore, we listed the most predictive amino acid pairs according to their weights in the established classification model. Evolutionary analysis shows that predictive amino acid pairs tend to be more conserved. Some predictive amino acid pairs also show significantly different position distributions between TTEs and non-TTEs. These analyses confirmed that some weakly conserved sequence motifs may play important roles in type-III secretion signals. Finally, we also used BEAN to scan one plant pathogen genome and showed that BEAN can be used for genome-wide TTE identification. The webserver and stand-alone version of BEAN are available at http://protein.cau.edu.cn:8080/bean/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics*
  • Bacterial Secretion Systems / genetics*
  • Computational Biology / methods
  • Conserved Sequence / genetics
  • Genome, Bacterial
  • Gram-Negative Bacteria / genetics
  • Gram-Negative Bacteria / pathogenicity*
  • Host-Pathogen Interactions / genetics*
  • Plants / microbiology
  • Sequence Homology, Amino Acid
  • Signal Transduction
  • Support Vector Machine

Substances

  • Bacterial Secretion Systems

Grants and funding

This work was supported by grants from the National Natural Science Foundation of China (31271414 and 31070259), the National Key Basic Research Program of China (2009CB918802 and 2012CB114104), and the Chinese Universities Scientific Fund (2012QJ146). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.