We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
翻译:我们建议对不同维度的概率密度进行模型观测,例如不同长度或可变长度序列的文本文件等,以概率密度为模型,采取一个合理尺度。拟议尺度是用于为非正常密度构建良好适量测试的内核斯坦因差异(KSD)实例。KSD由Stein操作员定义:当前用于测试的操作员适用于固定维度空间。作为我们的主要贡献,我们通过确定适当的Stein操作员,将KSD扩大到变量多层设置,并提出一个新的KSD健康标准。与先前的变量一样,拟议的KSD并不要求将密度正常化,从而允许对大类模型进行评估。我们的测试显示,在离散相继数据基准方面,实际运行良好。