Independence testing is a fundamental and classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) allow stopping earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. It is well known that classical batch tests are not tailored for streaming data settings, since valid inference after data peeking requires correcting for multiple testing, but such corrections generally result in low power. In this paper, we design sequential kernelized independence tests (SKITs) that overcome such shortcomings based on the principle of testing by betting. We exemplify our broad framework using bets inspired by kernelized dependence measures such as the Hilbert-Schmidt independence criterion (HSIC) and the constrained-covariance criterion (COCO). Importantly, we also generalize the framework to non-i.i.d. time-varying settings, for which there exist no batch tests. We demonstrate the power of our approaches on both simulated and real data.
翻译:独立测试是一个基本和典型的统计问题,在收集数据之前,当确定抽样规模时,在批量设置中对独立测试进行了广泛研究;然而,执业者往往倾向于采用适应所处理问题复杂性的程序,而不是事先确定抽样规模;理想的做法是,这种程序应当(a) 允许更早停止容易的任务(以及以后更难的任务),从而更好地利用现有资源,以及(b) 不断监测数据,并在收集新数据后有效纳入统计证据,同时控制虚假警报率;众所周知,典型的批量测试不是针对数据流设置的,因为数据偷看后的有效推断需要纠正多次测试,但这类纠正通常导致低功率;在本文件中,我们设计了基于测试原则克服这些缺陷的连续封闭式独立测试(SKITs),我们通过赌注来展示我们的广泛框架,我们利用了诸如Hilbert-Schmidt独立标准(HISIC)和制约性差度标准(CO)等内嵌式依赖性措施的启发,同时控制误差度标准;众所周知,因为数据浏览后的有效推断需要纠正多次测试,但通常导致低功率。