Investigating item response theory model performance in the context of evaluating clinical outcome assessments in clinical trials

Qual Life Res. 2024 Dec 12. doi: 10.1007/s11136-024-03873-z. Online ahead of print.

Abstract

Purpose: Item response theory (IRT) models are an increasingly popular method choice for evaluating clinical outcome assessments (COAs) for use in clinical trials. Given common constraints in clinical trial design, such as limits on sample size and assessment lengths, the current study aimed to examine the appropriateness of commonly used polytomous IRT models, specifically the graded response model (GRM) and partial credit model (PCM), in the context of how they are frequently used for psychometric evaluation of COAs in clinical trials.

Methods: Data were simulated under varying sample sizes, measure lengths, response category numbers, and slope strengths, as well as under conditions that violated some model assumptions, namely, unidimensionality and equality of item slopes. Model fit, detection of item local dependence, and detection of item misfit were all examined to identify conditions where one model may be preferable or results may contain a degree of bias.

Results: For unidimensional item sets and equal item slopes, the PCM and GRM performed similarly, and GRM performance remained consistent as slope variability increased. For not-unidimensional item sets, the PCM was somewhat more sensitive to this unidimensionality violation. Looking across conditions, the PCM did not demonstrate a clear advantage over the GRM for small sample sizes or shorter measure lengths.

Conclusion: Overall, the GRM and the PCM each demonstrated advantages and disadvantages depending on underlying data conditions and the model outcome investigated. We recommend careful consideration of the known, or expected, data characteristics when choosing a model and interpreting its results.

Keywords: Clinical outcome assessments (COAs); Clinical trials; Item response theory; Patient-reported outcomes (PROs); Psychometric validation.