Detecting weak, systematic signals hidden in a large collection of $p$-values published in academic journals is instrumental to identifying and understanding publication bias and $p$-value hacking in social and economic sciences. Given two probability distributions $P$ (null) and $Q$ (signal), we study the problem of detecting weak signals from the null $P$ based on $n$ independent samples: we model weak signals via displacement interpolation between $P$ and $Q$, where the signal strength vanishes with $n$. We propose a hypothesis testing procedure based on the Wasserstein distance from optimal transport theory, derive sharp conditions under which detection is possible, and provide the exact characterization of the asymptotic Type I and Type II errors at the detection boundary using empirical processes. Applying our testing procedure to real data sets on published $p$-values across academic journals, we demonstrate that a rigorous testing procedure can detect weak signals that are otherwise indistinguishable.
翻译:暂无翻译