Sequence-to-function deep learning frameworks for engineered riboregulators

Jacqueline A Valeri; Katherine M Collins; Pradeep Ramesh; Miguel A Alcantar; Bianca A Lepe; Timothy K Lu; Diogo M Camacho

doi:10.1038/s41467-020-18676-2

Sequence-to-function deep learning frameworks for engineered riboregulators

Nat Commun. 2020 Oct 7;11(1):5058. doi: 10.1038/s41467-020-18676-2.

Authors

Jacqueline A Valeri^#^{1

2}, Katherine M Collins^#^{1

3}, Pradeep Ramesh^#¹, Miguel A Alcantar², Bianca A Lepe^{1

2}, Timothy K Lu^{4

5}, Diogo M Camacho⁶

Affiliations

¹ Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA.
² Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
³ Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
⁴ Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. [email protected].
⁵ Synthetic Biology Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. [email protected].
⁶ Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA. [email protected].

^# Contributed equally.

Abstract

While synthetic biology has revolutionized our approaches to medicine, agriculture, and energy, the design of completely novel biological circuit components beyond naturally-derived templates remains challenging due to poorly understood design rules. Toehold switches, which are programmable nucleic acid sensors, face an analogous design bottleneck; our limited understanding of how sequence impacts functionality often necessitates expensive, time-consuming screens to identify effective switches. Here, we introduce Sequence-based Toehold Optimization and Redesign Model (STORM) and Nucleic-Acid Speech (NuSpeak), two orthogonal and synergistic deep learning architectures to characterize and optimize toeholds. Applying techniques from computer vision and natural language processing, we 'un-box' our models using convolutional filters, attention maps, and in silico mutagenesis. Through transfer-learning, we redesign sub-optimal toehold sensors, even with sparse training data, experimentally validating their improved performance. This work provides sequence-to-function deep learning frameworks for toehold selection and design, augmenting our ability to construct potent biological circuit components and precision diagnostics.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Base Sequence / genetics
Biotechnology / methods*
Computer Simulation
Datasets as Topic
Deep Learning*
Genetic Engineering / methods*
Genome, Human / genetics
Genome, Viral / genetics
Humans
Models, Genetic
Mutagenesis
Natural Language Processing
Riboswitch / genetics*
Structure-Activity Relationship
Synthetic Biology / methods*

Substances

Riboswitch