Signal Peptides Generated by Attention-Based Neural Networks

ACS Synth Biol. 2020 Aug 21;9(8):2154-2161. doi: 10.1021/acssynbio.0c00219. Epub 2020 Jul 27.

Abstract

Short (15-30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial Bacillus subtilis strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.

Keywords: Bacillus subtilis; machine learning; protein design; secretion; signal peptides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Bacillus subtilis / metabolism
  • Bacterial Proteins / metabolism
  • Databases, Protein
  • Machine Learning*
  • Protein Sorting Signals*
  • ROC Curve

Substances

  • Bacterial Proteins
  • Protein Sorting Signals