The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$\alpha$ test, one usually selects the rejection threshold as the $(1-\alpha)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.
翻译:内核最大偏差值 ~ (MMD) 是一个流行的分布间多变量距离测量标准, 它发现在两样样本测试中有用。 通常的内核- MMD 测试统计是一种在无效状态下衰落的U- 统计性, 因此它有一个棘手的限制分布。 因此, 要设计一个水平- $\ alpha$ 的测试, 人们通常选择拒绝阈值作为变异分布的( 1- alpha) $- qua) 。 由此产生的非参数测试具有一定的模量有效性, 但有很高的计算成本, 因为每次变换都需要等时间。 我们提议跨内核测试数据, 一种基于样本分割和学生化的新的四边时间 MMD 测试统计性。 我们证明, 在温和假设下, 跨面的 MMD 标准分布在无效状态下受到限制。 重要的是, 我们还表明, 由此得出的测试与任何固定的替代方法是一致的, 当使用高斯内核时, 它具有微轴- opax- opin- opimmattial owal owal at at the the the the primmmmmmmmprill mal devely