Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
翻译:学习分子图的本质分布和生成高纤维样本是药物发现和材料科学中的一个基本研究问题。然而,精确的模型分布和迅速生成新型分子图仍然是关键和具有挑战性的目标。为了实现这些目标,我们提议以离散图表结构为基础的用于分子图生成的新型条件扩散模型。具体地说,我们通过分流差异方程式(SDE),在图形结构和固有特征上建立一个前方图扩散过程,并得出离散图结构,作为反向基因化过程的条件。我们提出了一个专门的混合图形噪音预测模型,从中间图形状态中提取全球背景和局部节尖依赖值。我们进一步利用普通差异方程式(ODE)溶剂进行高效的图形取样,以概率流的半线性结构为基础。对多种数据集的实验证实了我们框架的有效性。特别是,拟议的方法仍然在有限的几个步骤中生成高质量的分子图。