The development of computational models for studying mental disorders is on the rise. However, their psychometric properties remain understudied, posing a risk of undermining their use in empirical research and clinical translation. Here we investigated test-retest reliability (with a 2-week interval) of a computational assay probing advice-taking under volatility with a Hierarchical Gaussian Filter (HGF) model. In a sample of 39 healthy participants, we found the computational measures to have largely poor reliability (intra-class correlation coefficient or ICC < 0.5), on par with the behavioral measures of task performance. Further analysis revealed that reliability was substantially impacted by intrinsic measurement noise (indicated by parameter recovery analysis) and to a smaller extent by practice effects. However, a large portion of within-subject variance remained unexplained and may be attributable to state-like fluctuations. Despite the poor test-retest reliability, we found the assay to have face validity at the group level. Overall, our work highlights that the different sources of variance affecting test-retest reliability need to be studied in greater detail. A better understanding of these sources would facilitate the design of more psychometrically sound assays, which would improve the quality of future research and increase the probability of clinical translation.
Copyright: © 2024 Karvelis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.