We propose a goodness-of-fit measure for probability densities modelling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalised densities. Existing KSDs require the model to be defined on a fixed-dimension space. As our major contributions, we extend the KSD to the variable dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalised, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
翻译:我们为不同维度的概率密度建模观测建议了一个适合的尺度,例如不同长度或不同长度序列的文本文件。拟议尺度是用于为非正常密度构建良好适量测试的内核斯坦因差异(KSD)实例。现有的KSD要求将模型定义在固定尺寸空间上。作为我们的主要贡献,我们将KSD扩展至变量维度设置,确定适当的Stein操作员,并提出一个新的KSD优异标准。与以前的变量一样,拟议的KSD不要求将密度正常化,允许对大类模型进行评估。我们的测试显示,在离散相继数据基准方面,我们在实践中表现良好。