We propose E-C2ST, a classifier two-sample test for high-dimensional data based on E-values. Compared to $p$-values-based tests, tests with E-values have finite sample guarantees for the type I error. E-C2ST combines ideas from existing work on split likelihood ratio tests and predictive independence testing. The resulting E-values incorporate information about the alternative hypothesis. We demonstrate the utility of E-C2ST on simulated and real-life data. In all experiments, we observe that when going from small to large sample sizes, as expected, E-C2ST starts with lower power compared to other methods but eventually converges towards one. Simultaneously, E-C2ST's type I error stays substantially below the chosen significance level, which is not always the case for the baseline methods. Finally, we use an MRI dataset to demonstrate that multiplying E-values from multiple independently conducted studies leads to a combined E-value that retains the finite sample type I error guarantees while increasing the power.
翻译:我们建议 E-C2ST, 这是基于 E- value 的高维数据二类抽样测试。 与 $p$- value 测试相比, E- value 测试具有I 类错误的有限样本保证。 E- C2ST 结合了目前关于 差异概率比测试和预测独立测试工作的观点。 由此产生的 E- value 包含关于替代假设的信息。 我们展示了 E- C2ST 在模拟和真实生命数据中的效用。 在所有实验中, 我们观察到, E- C2ST 与预期的大小相比, 从小到大样本, 与其他方法相比, E- C2ST 开始的强度较低,但最终会趋同于一种方法。 同时, E- C2ST 的I 类型错误远远低于所选的意义水平, 而基线方法并不总是如此。 最后, 我们使用一个 MRI 数据集来证明, 从多个独立进行的研究中乘 E- 的 E 值导致一种组合 E- 值, 保留 定型样本类型 I 的保证在增加 权力 。