PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

Elife. 2021 Sep 17:10:e70576. doi: 10.7554/eLife.70576.

Abstract

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Keywords: bioinformatics; computational biology; functional annotation; high-throughput methods; human; machine learning; proteomics; systems biology.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein*
  • Deep Learning
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Neural Networks, Computer*
  • Phosphorylation
  • Proteins / analysis
  • Proteins / chemistry
  • Proteins / metabolism
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Proteins

Grants and funding

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.