Background: There exists several statistical methods for detecting a difference of detection rates between alternative and reference qualitative microbiological assays in a single laboratory validation study with an unpaired design.
Objective: We compared performance of eight methods including Fisher's exact test, unequal variance two-sample t-test, Wilcoxon rank-sum test, z-test, and methods based on Wilson confidence intervals, complementary log-log regression, Firth's logistic regression, and ordinary logistic regression.
Method: We first compared the minimum detectable difference in the proportion of detections between the alternative and reference methods among these statistical methods for a varied number of test portions. We then compared power and size of test of these methods using simulated data.
Results: Firth's logistic regression and the unequal variance two-sample t-test had the lowest minimum detectable difference and highest power. None of these statistical methods had an estimated size of test always within a 95% confidence interval of the nominal value 0.05 with small numbers of test portions (n = 12, 20, 30). Fisher's exact test, the Wilcoxon rank-sum test, and the z-test were conservative even with a moderately large number of test portions (n = 40), while Firth's logistic regression and the unequal variance two-sample t-test had a size of test closer to 0.05 than other methods.
Conclusions: Firth's logistic regression and the unequal variance two-sample t-test are better choices than other competing methods.
Highlights: We recommend the unequal variance two-sample t-test over Firth's logistic regression because the unequal variance two-sample t-test is better known and easier to use. We provide an example using real data.
AOAC INTERNATIONAL 2020. This work is written by a US Government employee and is in the public domain in the US.