Pre-training strategies based on self-supervised learning (SSL) have demonstrated success as pretext tasks for downstream tasks in computer vision. However, while SSL methods are often domain-agnostic, their direct application to medical imaging is challenging due to the distinct nature of medical images, including specific anatomical and temporal patterns relevant to disease progression. Additionally, traditional SSL pretext tasks often lack the contextual knowledge that is essential for clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) that builds on the Transformer-based MAE architecture, specifically introducing a time-aware position embedding and a disease progression-aware masking strategy. Unlike traditional sequential approaches, our method incorporates the actual time intervals between examinations, allowing for better capture of temporal trends. Furthermore, the masking strategy evolves in alignment with disease progression during follow-up exams to capture pathological changes, improving disease progression assessments. Using the OPHDIAT dataset, a large-scale longitudinal screening dataset for diabetic retinopathy (DR), we evaluated our pre-trained model by predicting the severity level at the next visit within three years, based on past examination series. Our findings demonstrate that both the time-aware position embedding and the disease progression-informed masking significantly enhance predictive accuracy. Compared to conventional baseline models and standard longitudinal Transformers, these simple yet effective adaptations substantially improve the predictive power of deep classification models in this domain.
Keywords: Diabetic retinopathy; Disease progression; Longitudinal analysis; Pretext task; Self-supervised learning.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.