Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.
翻译:分子图形的生成是药物发现的一个根本性问题,并日益引起注意。这个问题具有挑战性,因为它不仅需要产生具有化学效力的分子结构,而且需要同时优化其化学特性。受深层基因模型最近的进展的启发,我们在本文件中为图形生成提出了一种流动的自动递减模型,称为GigapAF。图AF结合了自动递减和流动方法的优势,并享有:(1)数据密度估计的高度模型灵活性;(2)培训的高效平行计算;(3)迭代抽样程序,它使得能够利用化学领域知识进行价值检验。实验结果显示,即使没有化学知识规则和100%具有化学规则的有效分子也能产生68%的化学有效分子。图AF的训练过程比现有的先进方法GCPN快两倍。在通过强化学习对目标导向财产优化模型进行微调之后,图AF在化学财产优化和受限制的财产优化两方面都取得了最先进的业绩。