L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction

Rachid Zeghlache; Pierre-Henri Conze; Mostafa El Habib Daho; Yihao Li; Alireza Rezaei; Hugo Le Boité; Ramin Tadayoni; Pascal Massin; Béatrice Cochener; Ikram Brahim; Gwenolé Quellec; Mathieu Lamard

doi:10.1016/j.compbiomed.2024.109508

L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction

Comput Biol Med. 2024 Dec 16:185:109508. doi: 10.1016/j.compbiomed.2024.109508. Online ahead of print.

Authors

Affiliations

¹ LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France. Electronic address: [email protected].
² LaTIM UMR 1101, Inserm, Brest, France; IMT Atlantique, Brest, France.
³ LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France.
⁴ Sorbonne University, Paris, France; Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France.
⁵ Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France; Paris Cité University, Paris, France.
⁶ LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France; Ophthalmology Department, CHRU Brest, Brest, France.
⁷ LaTIM UMR 1101, Inserm, Brest, France; INSERM U1227 Lymphocytes B et Autoimmunite (LBAI), Brest, France.
⁸ LaTIM UMR 1101, Inserm, Brest, France.

PMID: 39689525
DOI: 10.1016/j.compbiomed.2024.109508

Abstract

Pre-training strategies based on self-supervised learning (SSL) have demonstrated success as pretext tasks for downstream tasks in computer vision. However, while SSL methods are often domain-agnostic, their direct application to medical imaging is challenging due to the distinct nature of medical images, including specific anatomical and temporal patterns relevant to disease progression. Additionally, traditional SSL pretext tasks often lack the contextual knowledge that is essential for clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) that builds on the Transformer-based MAE architecture, specifically introducing a time-aware position embedding and a disease progression-aware masking strategy. Unlike traditional sequential approaches, our method incorporates the actual time intervals between examinations, allowing for better capture of temporal trends. Furthermore, the masking strategy evolves in alignment with disease progression during follow-up exams to capture pathological changes, improving disease progression assessments. Using the OPHDIAT dataset, a large-scale longitudinal screening dataset for diabetic retinopathy (DR), we evaluated our pre-trained model by predicting the severity level at the next visit within three years, based on past examination series. Our findings demonstrate that both the time-aware position embedding and the disease progression-informed masking significantly enhance predictive accuracy. Compared to conventional baseline models and standard longitudinal Transformers, these simple yet effective adaptations substantially improve the predictive power of deep classification models in this domain.

Keywords: Diabetic retinopathy; Disease progression; Longitudinal analysis; Pretext task; Self-supervised learning.