In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).
翻译:暂无翻译