This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model defines a diffusion process that progressively edits a graph with noise (adding or removing edges, changing the categories), and a graph transformer network that learns to revert this process. With these two ingredients in place, we reduce distribution learning over graphs to a simple sequence of classification tasks. We further improve sample quality by proposing a new Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by adding auxiliary graph-theoretic features derived from the noisy graph at each diffusion step. Finally, we propose a guidance procedure for conditioning the generation on graph-level features. Overall, DiGress achieves state-of-the-art performance on both molecular and non-molecular datasets, with up to 3x validity improvement on a dataset of planar graphs. In particular, it is the first model that scales to the large GuacaMol dataset containing 1.3M drug-like molecules without using a molecule-specific representation such as SMILES or fragments.
翻译:这项工作引入了DiGress, 这是一种用于生成带有绝对节点和边缘属性的图形的离散分解扩散模型。 我们的模型定义了一个逐渐编辑带有噪音( 添加或删除边缘, 改变类别) 的图形的传播过程, 和一个学习恢复此过程的图形变压器网络。 有了这两个元素, 我们就可以将图形的分布学习减少到简单的分类任务序列。 我们通过提出一个新的Markovian噪声模型来进一步提高样本质量, 该模型在扩散过程中保存节点和边缘类型的边际分布, 并通过在每个扩散步骤添加来自噪音图形的辅助图形理论特征。 最后, 我们提出了一个指导程序, 用于在图形级别特性上调节生成。 总体来说, DiGress 可以在分子和非分子数据集中达到最新水平的性能, 在平面图的数据集上达到3x有效性改进。 特别是, 这是第一个在不使用像 SMILES 或 碎片这样的分子专用表示方式的情况下, 将包含1.3M 药物类分子的大型数据集进行比例的模型。