The classical likelihood ratio test (LRT) based on the asymptotic chi-squared distribution of the log likelihood is one of the fundamental tools of statistical inference. A recent universal LRT approach based on sample splitting provides valid hypothesis tests and confidence sets in any setting for which we can compute the split likelihood ratio statistic (or, more generally, an upper bound on the null maximum likelihood). The universal LRT is valid in finite samples and without regularity conditions. This test empowers statisticians to construct tests in settings for which no valid hypothesis test previously existed. For the simple but fundamental case of testing the population mean of d-dimensional Gaussian data with identity covariance matrix, the classical LRT itself applies. Thus, this setting serves as a perfect test bed to compare the classical LRT against the universal LRT. This work presents the first in-depth exploration of the size, power, and relationships between several universal LRT variants. We show that a repeated subsampling approach is the best choice in terms of size and power. For large numbers of subsamples, the repeated subsampling set is approximately spherical. We observe reasonable performance even in a high-dimensional setting, where the expected squared radius of the best universal LRT's confidence set is approximately 3/2 times the squared radius of the classical LRT's spherical confidence set. We illustrate the benefits of the universal LRT through testing a non-convex doughnut-shaped null hypothesis, where a universal inference procedure can have higher power than a standard approach.
翻译:依据对日志可能性的无症状、 奇差分布的经典概率比值测试(LRT) 基于对日志可能性的无症状、 奇差分布, 是统计推断的基本工具之一。 最近基于抽样分割的普遍 LRT 方法提供了有效的假设测试和信任套数, 我们可以计算对差差概率统计( 或更一般地说, 是无最大可能性的上限) 。 通用 LRT 在有限的样本中有效, 没有规律性条件下有效。 这个测试使统计人员能够在以前没有有效假设测试的环境下进行测试。 对于以身份变量变异矩阵测试d维度数据的人口平均值的简单而基本案例, 古典 LRT 方法本身也适用。 因此, 这个设置是一个完美的测试床, 用来比较典型的 LRT 和通用 LRT 数据( 或通用 LRT ) 。 这项工作首次深入探索了几个通用 LRT 变异体的大小、 和关系。 我们显示, 重复的子抽样方法是在规模和权力方面的最佳选择。 对于大量子抽样, 重复的子检验组比标准 通用的基数, 通用的基调基调的基数, 标准大约是通用平差标准 。