We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS tests). We first optimize the witness function by maximizing an asymptotic test-power objective and then use as the test statistic the difference in means of the witness evaluated on two held-out test samples. When the witness function belongs to a reproducing kernel Hilbert space, we show that the optimal witness is given via kernel Fisher discriminant analysis, whose solution we compute in closed form. We show that the WiTS test based on a characteristic kernel is consistent against any fixed alternative. Our experiments demonstrate that the WiTS test can achieve higher test power than existing two-sample tests with optimized kernels, suggesting that learning a high- or infinite-dimensional representation of the data may not be necessary for two-sample testing. The proposed procedure works beyond kernel methods, allowing practitioners to apply it within their preferred machine learning framework.
翻译:我们根据单维证人功能提出依靠数据的测试统计数据,我们称之为证人双模测试(WITS测试),我们首先通过最大限度地实现无症状测试功率目标优化证人功能,然后将两个悬停测试样品上评估的证人手段的差异作为测试统计数据。当证人功能属于复制内核Hilbert空间时,我们证明最佳证人是通过内核Fish Discriminant分析提供的,我们以封闭形式计算其解决办法。我们证明基于一个特性内核的WITS测试与任何固定的替代试验是一致的。我们的实验表明WITS测试能够取得比现有用优化的内核进行的现有双模测试更高的测试功率,这表明在进行两端测试时可能没有必要学习数据的高或无限的描述。拟议的程序在内核方法之外起作用,让从业人员在他们喜欢的机器学习框架内应用。