Many biological problems are understudied due to experimental limitations and human biases. Although deep learning is promising in accelerating scientific discovery, its power compromises when applied to problems with scarcely labeled data and data distribution shifts. We developed a semi-supervised meta learning framework - Meta Model Agnostic Pseudo Label Learning (MMAPLE) - to address these challenges by effectively exploring out-of-distribution (OOD) unlabeled data when transfer learning fails. The power of MMAPLE is demonstrated in multiple applications: predicting OOD drug-target interactions, hidden human metabolite-enzyme interactions, and understudied interspecies microbiome metabolite-human receptor interactions, where chemicals or proteins in unseen data are dramatically different from those in training data. MMAPLE achieves 11% to 242% improvement in the prediction-recall on multiple OOD benchmarks over baseline models. Using MMAPLE, we reveal novel interspecies metabolite-protein interactions that are validated by bioactivity assays and fill in missing links in microbiome-human interactions. MMAPLE is a general framework to explore previously unrecognized biological domains beyond the reach of present experimental and computational techniques.
Keywords: deep learning; drug-target interaction; machine learning; metabolite-protein interaction; microbiome-host interaction; pseudo-label; transfer learning.