Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists
Authors:
Albert Swiecicki,
Nianyi Li,
Jonathan O'Donnell,
Nicholas Said,
Jichen Yang,
Richard C. Mather,
William A. Jiranek,
Maciej A. Mazurowski
Abstract:
A fully-automated deep learning algorithm matched performance of radiologists in assessment of knee osteoarthritis severity in radiographs using the Kellgren-Lawrence grading system.
To develop an automated deep learning-based algorithm that jointly uses Posterior-Anterior (PA) and Lateral (LAT) views of knee radiographs to assess knee osteoarthritis severity according to the Kellgren-Lawrence g…
▽ More
A fully-automated deep learning algorithm matched performance of radiologists in assessment of knee osteoarthritis severity in radiographs using the Kellgren-Lawrence grading system.
To develop an automated deep learning-based algorithm that jointly uses Posterior-Anterior (PA) and Lateral (LAT) views of knee radiographs to assess knee osteoarthritis severity according to the Kellgren-Lawrence grading system.
We used a dataset of 9739 exams from 2802 patients from Multicenter Osteoarthritis Study (MOST). The dataset was divided into a training set of 2040 patients, a validation set of 259 patients and a test set of 503 patients. A novel deep learning-based method was utilized for assessment of knee OA in two steps: (1) localization of knee joints in the images, (2) classification according to the KL grading system. Our method used both PA and LAT views as the input to the model. The scores generated by the algorithm were compared to the grades provided in the MOST dataset for the entire test set as well as grades provided by 5 radiologists at our institution for a subset of the test set.
The model obtained a multi-class accuracy of 71.90% on the entire test set when compared to the ratings provided in the MOST dataset. The quadratic weighted Kappa coefficient for this set was 0.9066. The average quadratic weighted Kappa between all pairs of radiologists from our institution who took a part of study was 0.748. The average quadratic-weighted Kappa between the algorithm and the radiologists at our institution was 0.769.
The proposed model performed demonstrated equivalency of KL classification to MSK radiologists, but clearly superior reproducibility. Our model also agreed with radiologists at our institution to the same extent as the radiologists with each other. The algorithm could be used to provide reproducible assessment of knee osteoarthritis severity.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model
Authors:
Mateusz Buda,
Ashirbani Saha,
Ruth Walsh,
Sujata Ghate,
Nianyi Li,
Albert Święcicki,
Joseph Y. Lo,
Maciej A. Mazurowski
Abstract:
Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an is…
▽ More
Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an issue for digital breast tomosynthesis (DBT) which is a relatively new breast cancer screening modality. We have curated and made publicly available a large-scale dataset of digital breast tomosynthesis images. It contains 22,032 reconstructed DBT volumes belonging to 5,610 studies from 5,060 patients. This included four groups: (1) 5,129 normal studies, (2) 280 studies where additional imaging was needed but no biopsy was performed, (3) 112 benign biopsied studies, and (4) 89 studies with cancer. Our dataset included masses and architectural distortions which were annotated by two experienced radiologists. Additionally, we developed a single-phase deep learning detection model and tested it using our dataset to serve as a baseline for future research. Our model reached a sensitivity of 65% at 2 false positives per breast. Our large, diverse, and highly-curated dataset will facilitate development and evaluation of AI algorithms for breast cancer screening through providing data for training as well as common set of cases for model validation. The performance of the model developed in our study shows that the task remains challenging and will serve as a baseline for future model development.
△ Less
Submitted 20 November, 2022; v1 submitted 13 November, 2020;
originally announced November 2020.