Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.
翻译:由于对高维和大尺度数据越来越多地使用内核基度数据,我们研究了在尺寸和样本大小都与无限性相异时内核两样抽样测试的无症状行为。我们侧重于使用异氧内核的最大平均值差异(MMD),包括高山内核和拉贝内核的MMD,以及作为特例的能量距离。我们从内核两样抽样统计数据中获取无症状扩展数据,根据这些数据,我们在无效假设以及本地和固定替代品下确定了核心定理器(CLT)。新的非核CLT结果使我们能够进行无症状精确功率分析,这揭示出两样内核试验所检测到的时差与尺寸和抽样顺序之间的微妙相互作用。通过数字研究进一步证实了无症状理论。