Domain adaptation in small-scale and heterogeneous biological datasets

Seyedmehdi Orouji; Martin C Liu; Tal Korem; Megan A K Peters

doi:10.1126/sciadv.adp6040

Domain adaptation in small-scale and heterogeneous biological datasets

Sci Adv. 2024 Dec 20;10(51):eadp6040. doi: 10.1126/sciadv.adp6040. Epub 2024 Dec 20.

Authors

Seyedmehdi Orouji¹, Martin C Liu^{2

3}, Tal Korem^{3

4

5}, Megan A K Peters^{1

5

6}

Affiliations

¹ Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA.
² Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
³ Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
⁴ Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA.
⁵ CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, Canada.
⁶ CIFAR Fellow, Program in Brain, Mind, & Consciousness, CIFAR, Toronto, Canada.

Abstract

Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.

Publication types

Review

MeSH terms

Computational Biology* / methods
Databases, Factual
Humans
Machine Learning*