Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
翻译:典型的统计推断理论通常涉及通过确定维度来校准统计,确定维度值$美元,同时让抽样规模增加0.2美元。最近,我们投入了大量精力来了解这些方法在高维环境中的运行方式,因为美元和美元都同时增加至不完全性。这往往导致不同的推论程序,这取决于对维度的假设,使执业者处于一种束缚状态:给一个有100个20维样本的数据集,如果它们通过假设美元=gg d$或$/n=approx0.2美元来校准一个跨维度数据集?本文考虑的是维度-通性推断的目标;开发这些方法的有效性并不取决于对美元和美元的任何假设。我们采用了一种方法,使用现有测试统计数据的变式表述以及样本分裂和自我规范化,让执业者产生一个新的测试统计,而高度分布范围为100个样本,而不论美元与美元的比例大小如何校准。由此得出的统计可被视为对低的U-统计-统计-直立度替代品的精确度,包括我们内部的正正态测试的一块和内部测试。