Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and their power is optimal up to a $\sqrt 2$ factor. We end by suggesting some next steps for extending dimension-agnostic inference to other problems.
翻译:典型的统计推断理论通常涉及通过确定维度来校准统计,确定维度值美元,同时让抽样规模增加0.2美元。最近,我们投入了大量精力来了解这些方法在高维环境中的运行方式,因为美元和美元两者都增长到无限性。这往往导致不同的推论程序,这取决于对维度的假设,使执业者处于一种束缚状态:给一个有100个样本的数据集20维度,如果它们通过假设美元=gg d$或$/n=approx 0.2美元校准?本文考虑的是维度-通性推断性推断的目标;开发这些方法的有效性并不取决于对美元和美元的任何假设。我们采用了一种方法,使用现有测试统计数据的变式表述以及样本的分裂和自我调整,使执业者产生一种新的测试统计,使用高官的有限度分布。由此得出的统计可视为对低度的U-cotical 度的优化度的调整,放弃三角区块块和保持非直径度推断值性推断;我们所展示的正统性测试方法,包括一种微度测试。