The role of the p-value in the multitesting problem

J Appl Stat. 2019 Oct 23;47(9):1529-1542. doi: 10.1080/02664763.2019.1682128. eCollection 2020.

Abstract

Modern science frequently involves the analysis of large amount of quantitative information and the simultaneous testing of thousands or even hundreds of thousands null hypotheses. In this context, sometimes, naive deductions derived from the statistical reports substitute the rational thinking. The reproducibility crisis is a direct consequence of the misleading statistical conclusions. In this paper, the authors revisit some of the controversies on the implications derived from the statistical hypothesis testing. They focus on the role of the p-value on the massive multitesting problem and the loss of its standard probabilistic interpretation. The analogy between the hypothesis tests and the usual diagnostic process (both involve a decision-making) is used to point out some limitations in the probabilistic p-value interpretation and to introduce the receiver-operating characteristic, ROC, curve as a useful tool in the large-scale multitesting context. The analysis of the well-known Hedenfalk data illustrates the problem.

Keywords: (Bio)markers; false-discovery rate; hypothesis testing; multitesting problem; p-value; receiver-operating characteristic (ROC) curve.

Grants and funding

This work is supported by the Grants MTM2014-55966-P,MTM2015-63971-P,MTM2017-89422-P (ERDF support included) from the Ministerio de Economia y Competitividad (Spain), FC-15-GRUPIN14-101 from the Asturies Government and Severo Ochoa Grant BP16118 (this one for S. Pérez-Fernández).