This paper develops a model-free sequential test for conditional independence. The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure, and safely conclude whether a feature is conditionally associated with the response under study. We allow the processing of data points online, as soon as they arrive, and stop data acquisition once significant results are detected, rigorously controlling the type-I error rate. Our test can work with any sophisticated machine learning algorithm to enhance data efficiency to the extent possible. The developed method is inspired by two statistical frameworks. The first is the model-X conditional randomization test, a test for conditional independence that is valid in offline settings where the sample size is fixed in advance. The second is testing by betting, a ``game-theoretic'' approach for sequential hypothesis testing. We conduct synthetic experiments to demonstrate the advantage of our test over out-of-the-box sequential tests that account for the multiplicity of tests in the time horizon, and demonstrate the practicality of our proposal by applying it to real-world tasks.
翻译:本文开发了无模式的有条件独立序列测试。 拟议的测试允许研究人员分析输入的 i. d. d. 数据流, 并任意依赖结构, 安全地确定一个特性是否有条件地与研究中的答复相关。 我们允许在数据点到达后立即进行在线处理, 一旦检测出重要结果就停止获取数据, 严格控制I型误差率。 我们的测试可以使用任何先进的机器学习算法, 以尽可能提高数据效率。 开发的方法受两个统计框架的启发。 第一个是模型- X 有条件随机化测试, 这是在样本大小事先固定的离线设置中有效的有条件独立测试 。 第二个测试是通过打赌进行连续假设测试的“ 游戏理论” 方法 。 我们进行合成实验, 以展示我们的测试优势, 超越测试框外的顺序测试, 从而在时间范围上考虑到测试的多重性, 并展示我们提案的实用性, 将其应用到现实世界的任务中 。