Significant recent progress has been made on deriving combination rules that can take as input a set of arbitrarily dependent p-values, and produce as output a single valid p-value. Here, we show that under the assumption of exchangeability of the p-values, many of those rules can be improved (made more powerful). While this observation by itself has practical implications (for example, under repeated tests involving data splitting), it also has implications for combining arbitrarily dependent p-values, since the latter can be made exchangeable by applying a uniformly random permutation. In particular, we derive several simple randomized combination rules for arbitrarily dependent p-values that are more powerful than their deterministic counterparts. For example, we derive randomized and exchangeable improvements of well known p-value combination rules like "twice the median" and "twice the average", as well as geometric and harmonic means. The main technical advance is to show that all these combination rules can be obtained by calibrating the p-values to e-values (using an $\alpha$-dependent calibrator), averaging those e-values, converting to a level $\alpha$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests. The improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
翻译:暂无翻译