In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).
翻译:在本研究中,我们考虑参数模型的拟合优度检验问题。由于模型参数取值未知,该检验问题涉及一个复合零假设。在某些特殊情况下,共充分抽样方法能够通过以充分统计量(通常为未知参数的最大似然估计量)为条件,消除这些未知参数的影响。然而,许多常见的参数化场景无法采用此方法,因为以充分统计量为条件会导致检验功效丧失。Barber与Janson(2022)提出的近似共充分抽样框架提供了一种替代方案,将充分性替换为近似充分统计量(即最大似然估计量的噪声版本)。该方法在一系列无法应用共充分抽样的场景中恢复了检验功效,但仅适用于无约束最大似然估计定义明确且表现良好的场景,这隐含地假设了低维数据范式。本研究将近似共充分抽样扩展至约束与惩罚最大似然估计场景,使得更复杂的估计问题现可在该框架下处理,包括高斯混合模型(其无约束最大似然估计因退化问题而未明确定义)和高维高斯线性模型(其最大似然估计在$\ell_1$惩罚或形状约束等正则化条件下可表现良好)等示例。