Progress and opportunities of foundation models in bioinformatics

Qing Li; Zhihang Hu; Yixuan Wang; Lei Li; Yimin Fan; Irwin King; Gengjie Jia; Sheng Wang; Le Song; Yu Li

doi:10.1093/bib/bbae548

Progress and opportunities of foundation models in bioinformatics

Brief Bioinform. 2024 Sep 23;25(6):bbae548. doi: 10.1093/bib/bbae548.

Authors

Qing Li¹, Zhihang Hu¹, Yixuan Wang¹, Lei Li¹, Yimin Fan¹, Irwin King¹, Gengjie Jia², Sheng Wang^{3

4}, Le Song⁵, Yu Li¹

Affiliations

¹ Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China.
² Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China.
³ Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China.
⁴ Shenzhen Institute of Advanced Technology, Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen, Guangdong, 518055, China.
⁵ BioMap, Zhongguancun Life Science Park, Haidian District, Beijing, 100085, China.

Abstract

Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.

Keywords: artificial intelligence; bioinformatics; foundation models; large language models.

Progress and opportunities of foundation models in bioinformatics

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding