Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of inference methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. One set of tools which can help are goodness-of-fit tests, where we test whether a dataset could have been generated by a fixed distribution. Kernel-based tests have been developed to for this problem, and these are popular due to their flexibility, strong theoretical guarantees and ease of implementation in a wide range of scenarios. In this paper, we extend this line of work to the more challenging composite goodness-of-fit problem, where we are instead interested in whether the data comes from any distribution in some parametric family. This is equivalent to testing whether a parametric model is well-specified for the data.
翻译:模型区分不当会给实施概率模型带来重大挑战,这导致了一系列直接解释这一问题的推论方法的开发。然而,是否需要这些更多涉及的方法将取决于模型是否真的被错误地描述,而且没有普遍适用的方法来回答这个问题。一套工具可以帮助进行 " 完善的测试 ",我们在这里测试数据集是否由固定分布生成。基于内核的测试已经针对这一问题进行,由于在广泛的情景中具有灵活性、强有力的理论保障和执行的便利性,这些测试很受欢迎。在本文件中,我们将这项工作扩大到更具挑战性的综合 " 合理 " 问题,而我们则对数据是否来自某些参数类的分布感兴趣。这相当于测试对数据进行参数模型的指定是否很好。