Retrosynthesis is a major task for drug discovery. It is formulated as a graph-generating problem by many existing approaches. Specifically, these methods firstly identify the reaction center, and break target molecule accordingly to generate synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or directly adding proper leaving groups. However, both two strategies suffer since adding atoms results in a long prediction sequence which increases generation difficulty, while adding leaving groups can only consider the ones in the training set which results in poor generalization. In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Since chemically meaningful motifs are bigger than atoms and smaller than leaving groups, our method enjoys lower prediction complexity than adding atoms and better generalization than adding leaving groups. Experiments on a benchmark dataset show that the proposed model significantly outperforms previous state-of-the-art algorithms.
翻译:重新合成是药物发现的一项主要任务。 它被许多现有方法设计成一个生成图形的问题。 具体地说, 这些方法首先确定反应中心, 并相应地折断目标分子以生成合成物。 重新活性物质是通过将原子相继添加到合成物图形中或直接添加适当的离子组来生成的。 但是, 这两种战略都因为添加原子导致长的预测序列而受到影响, 从而增加了生成难度, 而添加左组则只能考虑训练组中那些导致简单化的图解问题。 在本文中, 我们提出了一个新的反转合成物预测端至终端图形生成模型模型, 以相继识别反应中心, 生成合成物, 并将模型添加到合成物中以生成反应物。 由于化学上有意义的模型比原子大, 小于离子组, 我们的方法的预测复杂性比添加原子要低, 并且比添加离子组要小。 在基准数据集上进行的实验显示, 拟议的模型明显超越了先前的状态算法。