The density estimation is one of the core problems in statistics. Despite this, existing techniques like maximum likelihood estimation are computationally inefficient due to the intractability of the normalizing constant. For this reason an interest to score matching has increased being independent on the normalizing constant. However, such estimator is consistent only for distributions with the full space support. One of the approaches to make it consistent is to add noise to the input data which is called Denoising Score Matching. In this work we derive analytical expression for the Denoising Score matching using the Kernel Exponential Family as a model distribution. The usage of the kernel exponential family is motivated by the richness of this class of densities. To tackle the computational complexity we use Random Fourier Features based approximation of the kernel function. The analytical expression allows to drop additional regularization terms based on the higher-order derivatives as they are already implicitly included. Moreover, the obtained expression explicitly depends on the noise variance, so the validation loss can be straightforwardly used to tune the noise level. Along with benchmark experiments, the model was tested on various synthetic distributions to study the behaviour of the model in different cases. The empirical study shows comparable quality to the competing approaches, while the proposed method being computationally faster. The latter one enables scaling up to complex high-dimensional data.
翻译:密度估算是统计的核心问题之一。 尽管如此, 现有技术, 如最大可能性估算等, 计算效率低, 原因是正常常态的吸引力。 由于这个原因, 对匹配的兴趣在正常常态上增加了。 但是, 这种估计仅对分布使用全部空间支持的一致性。 使其一致的方法之一是在输入数据中增加噪音, 即所谓的“ 低调分匹配 ” 。 在这项工作中, 我们用内核暴露家庭作为模型分布, 得出Denoising 评分匹配的分析表达方式。 使用内核指数家族的动机是这一类密度的丰富性。 为了解决计算复杂性, 我们使用随机四倍特征来接近内核功能。 分析表达方式允许根据较高排序衍生物( 它们是隐含的) 来减少额外的规范性术语 。 此外, 所获得的表达方式明确取决于噪声差异, 因此验证性损失可以直接用于调和噪声水平。 与基准实验一起, 模型的利用各种合成分布方式测试了各种合成分布方式, 以研究高度方法, 使模型能够比较性地进行复杂的计算 。