The primary practice of healthcare artificial intelligence (AI) starts with model development, often using state-of-the-art AI, retrospectively evaluated using metrics lifted from the AI literature like AUROC and DICE score. However, good performance on these metrics may not translate to improved clinical outcomes. Instead, we argue for a better development pipeline constructed by working backward from the end goal of positively impacting clinically relevant outcomes using AI, leading to considerations of causality in model development and validation, and subsequently a better development pipeline. Healthcare AI should be "actionable," and the change in actions induced by AI should improve outcomes. Quantifying the effect of changes in actions on outcomes is causal inference. The development, evaluation, and validation of healthcare AI should therefore account for the causal effect of intervening with the AI on clinically relevant outcomes. Using a causal lens, we make recommendations for key stakeholders at various stages of the healthcare AI pipeline. Our recommendations aim to increase the positive impact of AI on clinical outcomes.
Keywords: artificial intelligence; causal inference; healthcare.
© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.