We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power. These tests adapt to variations in distribution smoothness and shape over space, and are especially suited to high dimensions and complex data. By contrast, the simpler kernels used in prior kernel testing work are spatially homogeneous, and adaptive only in lengthscale. We explain how this scheme includes popular classifier-based two-sample tests as a special case, but improves on them in general. We provide the first proof of consistency for the proposed adaptation method, which applies both to kernels on deep features and to simpler radial basis kernels or multiple kernel learning. In experiments, we establish the superior performance of our deep kernels in hypothesis testing on benchmark and real-world data. The code of our deep-kernel-based two sample tests is available at https://github.com/fengliu90/DK-for-TST.
翻译:我们建议进行一类基于内核的双层抽样测试,目的是确定是否从同一分布区划中提取两组样本。我们的测试是用深神经网的内核参数制成的,经过培训以最大限度地提高测试力。这些测试适应空间分布平滑和形状的变化,特别适合高尺寸和复杂数据。相比之下,以前内核测试工作中使用的较简单的内核是空间均匀的,而且适应性只有长度。我们解释了这一方法如何将流行的分类法的两层样本测试作为一个特例纳入其中,但总体上加以改进。我们为拟议的适应方法提供了第一个一致性的证明,该方法既适用于深海内核,也适用于较简单的辐射基内核或多内核学习。在实验中,我们在基准和现实世界数据假设测试中确定了我们深层内核的优性表现。我们深层内核的两个样本测试的代码可在https://github.com/fengliu90/DK-for-ST上查阅。