Domain adaptation in small-scale and heterogeneous biological datasets

Sci Adv. 2024 Dec 20;10(51):eadp6040. doi: 10.1126/sciadv.adp6040. Epub 2024 Dec 20.

Abstract

Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.

Publication types

  • Review

MeSH terms

  • Computational Biology* / methods
  • Databases, Factual
  • Humans
  • Machine Learning*