In order to test if a treatment is perceptibly different from a placebo in a randomized experiment with covariates, classical nonparametric tests based on ranks of observations/residuals have been employed (eg: by Rosenbaum), with finite-sample valid inference enabled via permutations. This paper proposes a different principle on which to base inference: if -- with access to all covariates and outcomes, but without access to any treatment assignments -- one can form a ranking of the subjects that is sufficiently nonrandom (eg: mostly treated followed by mostly control), then we can confidently conclude that there must be a treatment effect. Based on a more nuanced, quantifiable, version of this principle, we design an interactive test called i-bet: the analyst forms a single permutation of the subjects one element at a time, and at each step the analyst bets toy money on whether that subject was actually treated or not, and learns the truth immediately after. The wealth process forms a real-valued measure of evidence against the global causal null, and we may reject the null at level $\alpha$ if the wealth ever crosses $1/\alpha$. Apart from providing a fresh "game-theoretic" principle on which to base the causal conclusion, the i-bet has other statistical and computational benefits, for example (A) allowing a human to adaptively design the test statistic based on increasing amounts of data being revealed (along with any working causal models and prior knowledge), and (B) not requiring permutation resampling, instead noting that under the null, the wealth forms a nonnegative martingale, and the type-1 error control of the aforementioned decision rule follows from a tight inequality by Ville. Further, if the null is not rejected, new subjects can later be added and the test can be simply continued, without any corrections (unlike with permutation p-values).
翻译:为了测试一种治疗是否明显不同于与共变实验随机的安慰剂试验中的安慰剂,我们采用了基于观察/累变等级的经典非参数测试(例如罗森鲍姆),通过变相启用了一定的抽样有效推断。本文提出了一个不同的原则,作为推断依据:如果 -- -- 能够使用所有变数和结果,但不能使用任何治疗任务 -- -- 可以形成一个足够非随机的(例如:多数治疗后大多没有控制)主题的排序,然后我们可以有信心地得出结论,必须有一个基于观察/累变等级的治疗效果。基于更细的、可量化的、本原则版本,我们设计了一个称为ibbet的互动式测试测试测试测试测试测试:每次对主题进行单一的调整,每一步分析师就能够对是否真正处理和不进行任何治疗任务,然后立即了解真相(财富过程是针对全球因果无效的证据的真实估价尺度,我们可能拒绝在美元和美元前的货币的数值上, 以正值为正值的汇率计算,如果之前的货币的汇率是新的货币计算结果, 则会持续地计算。