安全测试 (Safe Testing)

We develop the theory of hypothesis testing based on the E-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on E-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define growth-rate optimality (GRO) as an analogue of power in an optional continuation context, and we show how to construct GRO E-variables for general testing problems with composite null and alternative, emphasizing models with nuisance parameters. GRO E-values take the form of Bayes factors with special priors. We illustrate the theory using several classic examples including a one-sample safe t-test (in which the right Haar prior turns out to be GRO) and the 2x2 contingency table (in which the GRO prior is different from standard priors). Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, E-values and the corresponding tests may provide a methodology acceptable to adherents of all three schools.

翻译：我们开发了基于E值的假设测试理论,这是一个证据概念,与P值不同,它允许不费力地将共同设想中若干研究的结果结合起来,在共同设想中,作出进行新研究的决定可能取决于以往的结果。基于E值的测试是安全的,即根据这种任择性延续,它们保留了I型错误的保证。我们将增长率最佳性(GRO)定义为一种可选择的延续背景下的权力模拟,我们展示了如何建造GRO E变量,用于综合无效和替代的通用测试问题,强调有骚扰参数的模型。GRO E值以具有特殊前科的Bayes因素的形式出现。我们用几个典型的例子来说明这一理论,其中包括一个一模范安全测试(其中,右边的Haar原为GRO)和2x2应急表(其中以前的GRO与标准前不同),分享Fisherian、Neymanian和Jeffers-Bayes解释、E值和相应的测试可能为所有三所学校的信徒提供可以接受的方法。