DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM

Pac Symp Biocomput. 2019:24:66-75.

Abstract

Protein domain boundary prediction is usually an early step to understand protein function and structure. Most of the current computational domain boundary prediction methods suffer from low accuracy and limitation in handling multi-domain types, or even cannot be applied on certain targets such as proteins with discontinuous domain. We developed an ab-initio protein domain predictor using a stacked bidirectional LSTM model in deep learning. Our model is trained by a large amount of protein sequences without using feature engineering such as sequence profiles. Hence, the predictions using our method is much faster than others, and the trained model can be applied to any type of target proteins without constraint. We evaluated DeepDom by a 10-fold cross validation and also by applying it on targets in different categories from CASP 8 and CASP 9. The comparison with other methods has shown that DeepDom outperforms most of the current ab-initio methods and even achieves better results than the top-level template-based method in certain cases. The code of DeepDom and the test data we used in CASP 8, 9 can be accessed through GitHub at https://github.com/yuexujiang/DeepDom.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology
  • Databases, Protein
  • Deep Learning*
  • Neural Networks, Computer
  • Protein Domains*