增强内核双抽样测试的力量 (Boosting the Power of Kernel Two-Sample Tests)

The kernel two-sample test based on the maximum mean discrepancy (MMD) is one of the most popular methods for detecting differences between two distributions over general metric spaces. In this paper we propose a method to boost the power of the kernel test by combining MMD estimates over multiple kernels using their Mahalanobis distance. We derive the asymptotic null distribution of the proposed test statistic and use a multiplier bootstrap approach to efficiently compute the rejection region. The resulting test is universally consistent and, since it is obtained by aggregating over a collection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples. We also derive the distribution of the test statistic for both fixed and local contiguous alternatives. The latter, in particular, implies that the proposed test is statistically efficient, that is, it has non-trivial asymptotic (Pitman) efficiency. Extensive numerical experiments are performed on both synthetic and real-world datasets to illustrate the efficacy of the proposed method over single kernel tests. Our asymptotic results rely on deriving the joint distribution of MMD estimates using the framework of multiple stochastic integrals, which is more broadly useful, specifically, in understanding the efficiency properties of recently proposed adaptive MMD tests based on kernel aggregation.

翻译：基于最大平均差异(MMD)的内核双层抽样测试是发现一般计量空间两种分布分布之间差异的最流行方法之一。在本文件中,我们提出一种方法,通过使用马哈拉诺比斯距离将多内核的MMD估计数组合起来,增强内核测试的力量。我们得出拟议测试统计数据的无线分布,并使用倍增靴套件方法有效计算拒绝区域。由此得出的测试是普遍一致的,由于它是通过汇集一个内核/带宽的集合而获得的,因此在发现有限样本中各种替代品方面更为有力。我们还提出了一种方法,通过将MMMD估计数组合对多个内核内核内核测试进行整合,增强内核测试的力量。特别是后一种方法意味着拟议的测试在统计上是有效的,也就是说,它具有非边核的静态(Pitman)效率。在合成和真实世界的数据集上进行广泛的实验,以显示拟议方法对单一内核测试的功效,因此在有限的样本样本样本中,我们从更广义上得出了固定和局部相连接的测试结果。