Background: Central statistical monitoring in multicenter trials could allow trialists to identify centers with problematic data or conduct and intervene while the trial is still ongoing. Currently, there are few published models that can be used for this purpose.
Purpose: To develop and validate a series of risk scores to identify fabricated data within a multicenter trial, to be used in central statistical monitoring.
Methods: We used a database from a multicenter trial in which data from 9 of 109 centers were documented to be fabricated. These data were used to build a series of risk scores to predict fraud at centers. All analyses were performed at the level of the center. Exploratory factor analysis was used to select from 52 possible predictors, chosen from a variety of previously published methods. The final models were selected from a total of 18 independent predictors, based on the factors identified. These models were converted to risk scores for each center.
Results: Five different risk scores were identified, and each had the ability to discriminate well between centers with and without fabricated data (area under the curve values ranged from 0.90 to 0.95). True- and false-positive rates are presented for each risk score to arrive at a recommended cutoff of seven or above (high risk score). We validated these risk scores, using an independent multicenter trial database that contained no data fabrication and found the occurrence of false-positive high scores to be low and comparable to the model-building data set.
Limitations: These risk score have been validated only for their false-positive rate and require validation within another trial that contains centers that have fabricated data. Validation in noncardiovascular trials is also required to gage the usefulness of these risk scores in central statistical monitoring.
Conclusions: With further validation, these risk scores could become part of a series of tools that provide evidence-based central statistical monitoring, which in turn can improve the efficiency of trials, and minimize the need for more expensive on-site monitoring.