Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.
翻译:传播模型很快成为通过迭代完善对感官信号(如图像和声音)进行基因建模的上位范例,成功与否取决于基本物理现象是连续不断的。对于语言等固有的离散和绝对的数据,提出了各种传播驱动的替代方法。然而,传播模型的持续性质传达了许多好处,我们在这项工作中努力加以维护。我们提出了CDCDCD,这是一个用在时间和输入空间上都是连续不断的传播模型模拟绝对数据的框架。我们展示了它在若干语言建模任务上的效力。