Accurate and Ultra-Efficient p-Value Calculation for Higher Criticism Tests

Wenjia Wang; Yusi Fang; Chung Chang; George C Tseng

doi:10.1080/10618600.2023.2270720

Accurate and Ultra-Efficient p-Value Calculation for Higher Criticism Tests

J Comput Graph Stat. 2024;33(2):463-476. doi: 10.1080/10618600.2023.2270720. Epub 2023 Nov 27.

Authors

Wenjia Wang¹, Yusi Fang¹, Chung Chang², George C Tseng¹

Affiliations

¹ Department of Biostatistics, University of Pittsburgh.
² Department of Applied Mathematics, National Sun Yat-sen University.

Abstract

In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of p-values combined ( $K$ ) and/or the number of repeated HC tests ( $N$ ) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this paper, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( ${IS}_{C E}$ ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI and other existing methods with R package "HCp" for virtually any $K$ and small p-values ( $\sim 10^{- 20}$ ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3,342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.

Keywords: analytical approximation; asymptotic rare and weak model; higher criticism; importance sampling; p-value computation.

Grants and funding

R01 LM014142/LM/NLM NIH HHS/United States