Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.
翻译:适用性( GOF) 测试在统计上是无处不在的, 与模型选择、 信任间隔构建、 有条件的独立测试和多重测试直接相关, 仅列出几个应用程序。 测试一个简单( 点) 无假的 GOF 在选择测试统计时提供了分析者极大的灵活性, 但仍能确保有效性, 大部分关于复合无损假设的 GOF 测试都受到更大的限制, 因为测试统计数据必须在整个无模型空间上有一个可移动的分布。 一个显著的例外是共同满足抽样( CSS ) : 重新标注数据, 条件是使用任何测试性统计分析师选择的完全模型保证有效的 GOF 测试。 但是 CSS 测试要求空模型有一个契约( 信息理论意义上的) 充分的统计, 这只能维持一个非常有限的模型; 即使是像物流回归一样简单一样的无效模型, CSS 测试也无能为力。 在本文中, 我们利用大致充足性CSS 测试概念, 将任何模拟性能力测试的参数模型基本地展示任何匹配性模型。