Machine Learning Ensemble Directed Engineering of Genetically Encoded Fluorescent Calcium Indicators

Res Sq [Preprint]. 2023 Aug 7:rs.3.rs-3146778. doi: 10.21203/rs.3.rs-3146778/v1.

Abstract

In this study, we focused on the transformative potential of machine learning in the engineering of genetically encoded fluorescent indicators (GEFIs), protein-based sensing tools that are critical for real-time monitoring of biological activity. GEFIs are complex proteins with multiple dynamic states, rendering optimization by trial-and-error mutagenesis a challenging problem. We applied an alternative approach using machine learning to predict the outcomes of sensor mutagenesis by analyzing established libraries that link sensor sequences to functions. Using the GCaMP calcium indicator as a scaffold, we developed an ensemble of three regression models trained on experimentally derived GCaMP mutation libraries. We used the trained ensemble to perform an in silico functional screen on 1423 novel, uncharacterized GCaMP variants. As a result, we identified the novel ensemble-derived GCaMP (eGCaMP) variants, eGCaMP and eGCaMP+, that achieve both faster kinetics and larger fluorescent responses upon stimulation than previously published fast variants. Furthermore, we identified a combinatorial mutation with extraordinary dynamic range, eGCaMP2+, that outperforms the tested 6th, 7th, and 8th generation GCaMPs. These findings demonstrate the value of machine learning as a tool to facilitate the efficient pre-screening of mutants for functional characteristics. By leveraging the learning capabilities of our ensemble, we were able to accelerate the identification of promising mutations and reduce the experimental burden associated with trial-and-error mutagenesis. Overall, these findings have significant implications for optimizing GEFIs and other protein-based tools, demonstrating the utility of machine learning as a powerful asset in protein engineering.

Publication types

  • Preprint