Foundation model for efficient biological discovery in single-molecule data

bioRxiv [Preprint]. 2024 Aug 27:2024.08.26.609721. doi: 10.1101/2024.08.26.609721.

Abstract

Modern data-intensive techniques offer ever deeper insights into biology, but render the process of discovery increasingly complex. For example, exploiting the unique ability of single-molecule fluorescence microscopy (SMFM)1-5. to uncover rare but critical intermediates often demands manual inspection of time traces and iterative ad hoc approaches that are difficult to systematize. To facilitate systematic and efficient discovery from SMFM data, we introduce META-SiM, a transformer-based foundation model pre-trained on diverse SMFM analysis tasks. META-SiM achieves high performance-rivaling best-in-class algorithms-on a broad range of analysis tasks including trace selection, classification, segmentation, idealization, and stepwise photobleaching analysis. Additionally, the model produces high-dimensional embedding vectors that encapsulate detailed information about each trace, which the web-based META-SiM Projector (https://www.simol-projector.org) casts into lower-dimensional space for efficient whole-dataset visualization, labeling, comparison, and sharing. Combining this Projector with the objective metric of Local Shannon Entropy enables rapid identification of condition-specific behaviors, even if rare or subtle. As a result, by applying META-SiM to an existing single-molecule Förster resonance energy transfer (smFRET) dataset6, we discover a previously unobserved intermediate state in pre-mRNA splicing. META-SiM thus removes bottlenecks, improves objectivity, and both systematizes and accelerates biological discovery in complex single-molecule data.

Publication types

  • Preprint