Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
翻译:经典渐近理论用于统计推理通常包括通过固定维度d,而让样本大小n趋近于无穷大来构建统计量。最近,人们大量研究了这些方法在高维环境中的行为,其中d和n共同趋近于无穷大。这通常会导致不同的推理过程,具体取决于维度的假设,从而使从业者陷入困境:在具有100个样本和20个维度的数据集中,他们应该假设n≫d还是d/n≈0.2。本文考虑无维度推理的目标;开发方法,使其有效性不依赖于对d与n的任何假设。我们引入了一种方法,该方法使用现有测试统计量的变分表示,以及样本分裂和自标准化,以产生精确的测试统计量,其高斯极限分布不受d与n的任何影响。生成的统计量可以视为仔细修改退化的U-统计量,删除对角块并保留非对角块。我们为一些经典问题提供了实例,包括单样本均值和协方差测试,并展示我们的测试具有适当的局部替代方案下的最小最优功率。在大多数情况下,我们的交叉U-统计量匹配相应的(退化的)U-统计量的高维功率,最多相差一个√2因子。