We propose a novel nonparametric two-sample test based on the Maximum Mean Discrepancy (MMD), which is constructed by aggregating tests with different kernel bandwidths. This aggregation procedure, called MMDAgg, ensures that test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power), or arbitrary kernel choices such as the median heuristic. We work in the non-asymptotic framework, and prove that our aggregated test is minimax adaptive over Sobolev balls. Our guarantees are not restricted to a specific kernel, but hold for any product of one-dimensional translation invariant characteristic kernels which are absolutely and square integrable. Moreover, our results apply for popular numerical procedures to determine the test threshold, namely permutations and the wild bootstrap. Through numerical experiments on both synthetic and real-world datasets, we demonstrate that MMDAgg outperforms alternative state-of-the-art approaches to MMD kernel adaptation for two-sample testing.
翻译:我们建议基于最大平均值差异(MMD)的新型非参数二模模测试,该测试是通过对不同内核带宽的测试集成而成的。这一集成程序称为MMDAgg,它确保测试能力最大化于收集使用的内核,而无需为内核选择(导致测试力丧失)或任意的内核选择(如中值超常)提供搁置数据。我们在非抽取框架内工作,并证明我们的综合测试在索博列夫球上是小型的适应性试验。我们的保证并不局限于特定的内核,而是用于绝对和平坦的单维翻译特性内核内核的任何产品。此外,我们的结果适用于确定试验阈值的流行数字程序,即变异和野靴。通过在合成和真实世界数据集上的数值实验,我们证明MMDAgg在两层的测试中,MMD内核内核内核内核调整的替代状态方法优于。