Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. There is a plethora of generative models available, which build molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. Many drug discovery projects also require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has been recently explored. In this work, we propose a new graph-based model that learns to extend a given partial molecule by flexibly choosing between adding individual atoms and entire fragments. Extending a scaffold is implemented by using it as the initial partial graph, which is possible because our model does not depend on generation history. We show that training using a randomized generation order is necessary for good performance when extending scaffolds, and that the results are further improved by increasing fragment vocabulary size. Our model pushes the state-of-the-art of graph-based molecule generation, while being an order of magnitude faster to train and sample from than existing approaches.
翻译:分子的深层次学习模型的最近进步有望加速硅质药物的发现。 有大量的基因模型可以用来制造分子, 无论是原子的原子和粘结的体积, 或碎裂的体积。 许多药物发现项目还要求在生成的分子中存在固定的脚架, 并且最近已经探索了这种限制。 在这项工作中, 我们提出了一个新的图形模型, 学会通过在添加单个原子和整片之间灵活选择来扩展一个给定的局部分子。 扩展一个脚架是作为初始部分图的, 因为它是可能的, 因为我们的模式并不取决于代际历史。 我们表明, 使用随机生成顺序的培训对于在扩展骨架时表现良好是必要的, 并且通过增加碎片词汇大小来进一步改进结果。 我们的模型将基于图形生成的分子的状态推向最先进的艺术, 同时将速度提高到比现有方法更快的培训和样本。