Foundation models in ophthalmology: opportunities and challenges

Mertcan Sevgi; Eden Ruffell; Fares Antaki; Mark A Chia; Pearse A Keane

doi:10.1097/ICU.0000000000001091

Foundation models in ophthalmology: opportunities and challenges

Curr Opin Ophthalmol. 2025 Jan 1;36(1):90-98. doi: 10.1097/ICU.0000000000001091. Epub 2024 Nov 4.

Authors

Mertcan Sevgi^{1

2

3}, Eden Ruffell^{1

4

5

3}, Fares Antaki^{1

2

6}, Mark A Chia^{1

2

3}, Pearse A Keane^{1

2

3}

Affiliations

¹ Institute of Ophthalmology, University College London.
² Moorfields Eye Hospital NHS Foundation Trust.
³ NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust, London, UK.
⁴ Institute of Health Informatics.
⁵ Centre for Medical Image Computing, University College London.
⁶ The CHUM School of Artificial Intelligence in Healthcare, Montreal, Quebec, Canada.

Abstract

Purpose of review: Last year marked the development of the first foundation model in ophthalmology, RETFound, setting the stage for generalizable medical artificial intelligence (GMAI) that can adapt to novel tasks. Additionally, rapid advancements in large language model (LLM) technology, including models such as GPT-4 and Gemini, have been tailored for medical specialization and evaluated on clinical scenarios with promising results. This review explores the opportunities and challenges for further advancements in these technologies.

Recent findings: RETFound outperforms traditional deep learning models in specific tasks, even when only fine-tuned on small datasets. Additionally, LMMs like Med-Gemini and Medprompt GPT-4 perform better than out-of-the-box models for ophthalmology tasks. However, there is still a significant deficiency in ophthalmology-specific multimodal models. This gap is primarily due to the substantial computational resources required to train these models and the limitations of high-quality ophthalmology datasets.

Summary: Overall, foundation models in ophthalmology present promising opportunities but face challenges, particularly the need for high-quality, standardized datasets for training and specialization. Although development has primarily focused on large language and vision models, the greatest opportunities lie in advancing large multimodal models, which can more closely mimic the capabilities of clinicians.

Publication types

Review

MeSH terms

Artificial Intelligence*
Humans
Ophthalmology*