Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.
翻译:生成流和传播模型主要是关于正态数据的培训,例如自然图像。本文介绍了语言或图像分割等绝对数据流动和传播的两个延伸,即:正态流和多向分流。正态流的定义是连续分布的构成(如正常流)和正态函数。为了优化这一模型,我们学到了将绝对数据提升到连续空间的正方形的概率反差。多位元分流在传播过程中逐渐增加绝对噪音,为此学习了基因分解过程。我们证明,我们的方法超越了原木类图图像分解图的文本建模和建模的现有分法。