Background: Symptom manifestations in mood disorders can be subtle. Cumulatively, small imprecisions in measurement can limit our ability to measure treatment response accurately. Logical and statistical consistency checks between item responses (i.e., cross-sectionally) and across administrations (i.e., longitudinally) can contribute to improving measurement fidelity.
Methods: The International Society for CNS Clinical Trials and Methodology convened an expert Working Group that assembled flags indicating consistency/inconsistency ratings for the Hamilton Rating Scale for Depression (HAM-D17), a widely-used rating scale in studies of depression. Proposed flags were applied to assessments derived from the NEWMEDS data repository of 95,468 HAM-D administrations from 32 registration trials of antidepressant medications and to Monte Carlo-simulated data as a proxy for applying flags under conditions of known inconsistency.
Results: Two types of flags were derived: logical consistency checks and statistical outlier-response pattern checks. Almost thirty percent of the HAMD administrations had at least one logical scoring inconsistency flag. Seven percent had flags judged to suggest that a thorough review of rating is warranted. Almost 22% of the administrations had at least one statistical outlier flag and 7.9% had more than one. Most of the administrations in the Monte Carlo- simulated data raised multiple flags.
Limitations: Flagged ratings may represent less-common presentations of administrations done correctly.
Conclusions: Application of flags to clinical ratings may aid in detecting imprecise measurement. Reviewing and addressing these flags may improve reliability and validity of clinical trial data.
Keywords: Careless ratings; Consistency of measurement; HAM-D17; Hamilton Rating Scale for Depression; Inconsistent ratings; NEWMEDS.
Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.