We study the problem of designing consistent sequential two-sample tests in a nonparametric setting. Guided by the principle of testing by betting, we reframe this task into that of selecting a sequence of payoff functions that maximize the wealth of a fictitious bettor, betting against the null in a repeated game. In this setting, the relative increase in the bettor's wealth has a precise interpretation as the measure of evidence against the null, and thus our sequential test rejects the null when the wealth crosses an appropriate threshold. We develop a general framework for setting up the betting game for two-sample testing, in which the payoffs are selected by a prediction strategy as data-driven predictable estimates of the witness function associated with the variational representation of some statistical distance measures, such as integral probability metrics (IPMs). We then formally relate the statistical properties of the test~(such as consistency, type-II error exponent and expected sample size) to the regret of the corresponding prediction strategy. We construct a practical sequential two-sample test by instantiating our general strategy with the kernel-MMD metric, and demonstrate its ability to adapt to the difficulty of the unknown alternative through theoretical and empirical results. Our framework is versatile, and easily extends to other problems; we illustrate this by applying our approach to construct consistent tests for the following problems: (i) time-varying two-sample testing with non-exchangeable observations, and (ii) an abstract class of "invariant" testing problems, including symmetry and independence testing.
翻译:在非参数环境下,我们研究设计一致的连续两样抽样测试的问题。根据测试原则,我们以赌注为指南,将这项工作重新设定为选择一个支付功能序列,在反复游戏中将虚赌人的财富最大化,在空局上打赌。在这个环境中,赌人的财富相对增加有精确的解释,作为衡量无效的证据的尺度,因此,我们的顺序测试拒绝在财富跨过适当门槛时的无效。我们制定了一个为两样抽样测试设定赌注游戏的一般框架,通过一种预测战略选择付款,作为与某些统计距离计量(如整体概率度(IPMs)的变化性代表相关的证人功能的数据驱动的可预测估计数。然后,我们正式将测试的统计属性(如一致性、类型二误差和预期样本大小)与相应预测战略的遗憾联系起来。我们用非内核计量值来即时进行实际的顺序二样级测试,通过一种预测战略,通过一种预测战略选择的预测战略选择,选择得为以数据驱动的、可预测的估计性估算结果,包括整体概率度指标(IP度)等测试方法的不确定性,并展示我们从这一未知的实验性测试能力到其他测试的难度。