Feature importance is often used to explain clinical prediction models. In this work, we examine three challenges using experiments with electronic health record data: computational feasibility, choosing between methods, and interpretation of the resulting explanation. This work aims to create awareness of the disagreement between feature importance methods and underscores the need for guidance to practitioners how to deal with these discrepancies.
Keywords: Evaluating explanations; Explainable AI; Prediction modelling; Shapley values; Variable importance.