Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. In this work, we propose a new graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because our model is not conditioned on the generation history. At the same time, our generation procedure can flexibly choose between adding individual atoms and entire fragments. We show that training using a randomized generation order is necessary for good performance when extending scaffolds, and that the results are further improved by increasing the fragment vocabulary size. Our model pushes the state-of-the-art of graph-based molecule generation, while being an order of magnitude faster to train and sample from than existing approaches.
翻译:分子的深层学习模型的最近进步有望加速硅质药物发现。 有很多基因模型可以建立分子, 无论是原子和粘结或碎裂。 然而, 许多药物发现项目需要固定的脚架才能存在于生成的分子中, 并且最近才探索过这一限制。 在这项工作中, 我们提出了一个新的图形模型, 自然支持脚架作为基因化程序的初始种子, 因为我们的模型不以一代历史为条件。 与此同时, 我们的生成程序可以在添加单个原子和整片之间灵活选择。 我们表明, 使用随机生成顺序的培训对于扩展软骨时的良好性能是必要的, 并且通过增加碎片词汇大小来进一步改进结果。 我们的模型将基于图形生成的分子推向最先进的艺术, 而其规模比现有方法要快得多, 以培训和样本为主 。