The classical likelihood ratio test (LRT) based on the asymptotic chi-squared distribution of the log likelihood is one of the fundamental tools of statistical inference. A recent universal LRT approach based on sample splitting provides valid hypothesis tests and confidence sets in any setting for which we can compute the split likelihood ratio statistic (or, more generally, an upper bound on the null maximum likelihood). The universal LRT is valid in finite samples and without regularity conditions. This test empowers statisticians to construct tests in settings for which no valid hypothesis test previously existed. For the simple but fundamental case of testing the population mean of d-dimensional Gaussian data with identity covariance matrix, the classical LRT itself applies. Thus, this setting serves as a perfect test bed to compare the classical LRT against the universal LRT. This work presents the first in-depth exploration of the size, power, and relationships between several universal LRT variants. We show that a repeated subsampling approach is the best choice in terms of size and power. For large numbers of subsamples, this set is approximately spherical. We observe reasonable performance even in a high-dimensional setting, where the expected squared radius of the best universal LRT's confidence set is approximately 3/2 times the squared radius of the classical LRT's spherical confidence set. We illustrate the benefits of the universal LRT through testing a non-convex doughnut-shaped null hypothesis, where a universal inference procedure can have higher power than a standard approach.
翻译:依据对日志可能性的无症状、 奇差分布的经典概率比值测试(LRT) 基于对日志可能性的无症状、 奇差分布, 是统计推断的基本工具之一。 最近基于抽样分割的普遍 LRT 方法提供了有效的假设测试和信任套数, 我们可以计算对差差差概率统计(或更一般地说,无最大可能性的上限)的任何环境。 通用 LRT 在有限的样本中有效,没有常规条件。 这个测试使统计人员能够在以前没有有效假设测试的环境下进行测试。 对于用身份变量变异矩阵测试二维高斯数据的人口平均值这一简单但基本案例, 古典 LRT 本身也适用。 因此, 这个设置是一个完美的测试床, 用来比较典型的 LRT 与通用 LRT 对比。 这项工作首次深入探索了几个通用 LRT 变异体的大小、 和关系。 我们显示, 重复的次抽样方法是在规模和权力方面的最佳选择。 对于大量子抽样来说, 这个组是近是非概率的。 我们观察了标准中最合理的半数, 在高标准里程的里程里程里程里程里程中, 我们可以观察一个比标准里程 的测测测测测值 。