This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribution learning over graphs into a sequence of node and edge classification tasks. We further improve sample quality by introducing a Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by incorporating auxiliary graph-theoretic features. A procedure for conditioning the generation on graph-level features is also proposed. DiGress achieves state-of-the-art performance on molecular and non-molecular datasets, with up to 3x validity improvement on a planar graph dataset. It is also the first model to scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations.
翻译:这项工作引入了DiGress, 这是一个用于生成带有绝对节点和边缘属性的图形的离散分解扩散模型。 我们的模型使用一个离散扩散过程,通过添加或删除边缘和改变类别的过程,以噪音逐步编辑图表。 一个图形变压器网络接受培训,以恢复这一过程,将图的分布学习问题简化为节点和边缘分类任务的序列。 我们通过引入一个在扩散过程中保存节点和边缘类型的边际分布的马尔科维亚噪音模型,并通过纳入辅助图形理论特征,进一步提高样本质量。 还提议了一个使生成过程符合图形级特征的程序。 DiGress在分子和非分子数据集上实现了最新艺术性能,在平面图数据集上实现了最多3x有效性改进。 这也是第一个将包含1.3M类毒品分子的大型GuacaMol数据集扩大规模的模型,而不用分子特定表达方式。