Testing whether a variable of interest affects the outcome is one of the most fundamental problem in statistics and is often the main scientific question of interest. To tackle this problem, the conditional randomization test (CRT) is widely used to test the independence of variable(s) of interest (X) with an outcome (Y) holding other variable(s) (Z) fixed. The CRT uses randomization or design-based inference that relies solely on the iid sampling of (X,Z) to produce exact finite-sample p-values that are constructed using any test statistic. We propose a new method, the adaptive randomization test (ART), that tackles the independence problem while allowing the data to be adaptively sampled. We first showcase the ART in a particular multi-arm bandit problem known as the normal-mean model. Under this setting, we theoretically characterize the powers of both the iid sampling procedure and the adaptive sampling procedure and empirically find that the ART can uniformly outperform the CRT that pulls all arms independently with equal probability. We also surprisingly find that the ART can be more powerful than even the CRT that uses an oracle iid sampling procedure when the signal is relatively strong. We believe that the proposed adaptive procedure is successful because it takes arms that may initially look like "fake" signals due to random chance and stabilizes them closer to "null" signals. We additionally showcase the ART to a popular factorial survey design setting known as conjoint analysis. We find similar results through simulations and a recent application concerning the role of gender discrimination in political candidate evaluation.
翻译:测试一个利益变数是否影响结果是统计中最根本的问题之一,而且往往是主要的科学问题。为了解决这个问题,广泛使用有条件随机测试(CRT)来测试一个利益变数(X)的独立性,结果(Y)保持其他变数(Z)固定。CRT使用随机或设计为基础的推论,完全依赖(X,Z)的iid抽样,来生成精确的有限抽样 p-价值,而这种数值是使用任何测试统计来构建的。我们提出了一种新的方法,即适应性随机化测试(ART),它解决独立问题,同时允许对数据进行适应性抽样。我们首先在一个特定的多臂团问题(X)中展示ART的独立性,其结果被称为正常的模型。在这种背景下,我们从理论上确定仅依赖于(X,Z)的随机抽样程序的力量,它可以一致地超越利用任何测试数据来独立生成的所有武器。我们也很惊讶地发现,ART的稳定性测试比ART的初始设计信号(ART)更强大,同时允许对独立信号进行适应性测试,而CRT的模型则使用较接近于我们所主张的随机性测试程序,因为我们更接近的测试程序。