We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.
翻译:我们发现,限制U- Statistic的分布是从非变性高斯极限向堕落极限的过渡阶段,不管其是否退化,也只取决于一个瞬间比率。一个令人惊讶的后果是,高维的非变性U- Statistic可能具有非加萨性极限,其差异更大,分布不对称。我们的界限适用于任何有限的美元和美元,不受基本功能个体电子价值的影响,在轻度假设下不受维度独立影响。作为一个应用,我们将我们的理论应用于两个基于流行的内核分布试验MMD和KSD,其高维性表现一直难以研究。在一个简单的实验环境中,我们的结果正确地预测了在固定的门槛尺度上,以美元和带宽的测试力是如何以固定的。