Background: Applications of artificial intelligence (AI) are pervasive in modern biomedical science. In fact, research results suggesting algorithms and AI models for different target diseases and conditions are continuously increasing. While this situation undoubtedly improves the outcome of AI models, health care providers are increasingly unsure which AI model to use due to multiple alternatives for a specific target and the "black box" nature of AI. Moreover, the fact that studies rarely use guidelines in developing and reporting AI models poses additional challenges in trusting and adapting models for practical implementation.
Objective: This review protocol describes the planned steps and methods for a review of the synthesized evidence regarding the quality of available guidelines and frameworks to facilitate AI applications in medicine.
Methods: We will commence a systematic literature search using medical subject headings terms for medicine, guidelines, and machine learning (ML). All available guidelines, standard frameworks, best practices, checklists, and recommendations will be included, irrespective of the study design. The search will be conducted on web-based repositories such as PubMed, Web of Science, and the EQUATOR (Enhancing the Quality and Transparency of Health Research) network. After removing duplicate results, a preliminary scan for titles will be done by 2 reviewers. After the first scan, the reviewers will rescan the selected literature for abstract review, and any incongruities about whether to include the article for full-text review or not will be resolved by the third and fourth reviewer based on the predefined criteria. A Google Scholar (Google LLC) search will also be performed to identify gray literature. The quality of identified guidelines will be evaluated using the Appraisal of Guidelines, Research, and Evaluation (AGREE II) tool. A descriptive summary and narrative synthesis will be carried out, and the details of critical appraisal and subgroup synthesis findings will be presented.
Results: The results will be reported using the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) reporting guidelines. Data analysis is currently underway, and we anticipate finalizing the review by November 2023.
Conclusions: Guidelines and recommended frameworks for developing, reporting, and implementing AI studies have been developed by different experts to facilitate the reliable assessment of validity and consistent interpretation of ML models for medical applications. We postulate that a guideline supports the assessment of an ML model only if the quality and reliability of the guideline are high. Assessing the quality and aspects of available guidelines, recommendations, checklists, and frameworks-as will be done in the proposed review-will provide comprehensive insights into current gaps and help to formulate future research directions.
International registered report identifier (irrid): DERR1-10.2196/47105.
Keywords: artificial intelligence; biomedical; guidelines; machine learning; medicine.
©Kirubel Biruk Shiferaw, Moritz Roloff, Dagmar Waltemath, Atinkut Alamirrew Zeleke. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 25.10.2023.