Learning a categorical distribution comes with its own set of challenges. A successful approach taken by state-of-the-art works is to cast the problem in a continuous domain to take advantage of the impressive performance of the generative models for continuous data. Amongst them are the recently emerging diffusion probabilistic models, which have the observed advantage of generating high-quality samples. Recent advances for categorical generative models have focused on log likelihood improvements. In this work, we propose a generative model for categorical data based on diffusion models with a focus on high-quality sample generation, and propose sampled-based evaluation methods. The efficacy of our method stems from performing diffusion in the continuous domain while having its parameterization informed by the structure of the categorical nature of the target distribution. Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets.
翻译:最新工艺作品采用的成功方法,是将问题置于连续领域,以便利用基因模型的惊人性能来提供连续数据,其中包括最近出现的传播概率模型,这些模型具有生成高质量样本的明显优势;绝对基因模型的最新进展侧重于记录概率的改进;在这项工作中,我们提出了一个基于传播模型的绝对数据的基因化模型,重点是高质量的样本生成,并提出了基于抽样的评估方法;我们的方法的功效来自连续领域进行传播,同时根据目标分布的绝对性质结构对其参数化;我们的评价方法突出不同基因化模型生成绝对数据的能力和局限性,并包括对合成和真实世界蛋白数据集的实验。</s>