Retrosynthetic planning plays a critical role in drug discovery and organic chemistry. Starting from a target molecule as the root node, it aims to find a complete reaction tree subject to the constraint that all leaf nodes belong to a set of starting materials. The multi-step reactions are crucial because they determine the flow chart in the production of the Organic Chemical Industry. However, existing datasets lack curation of tree-structured multi-step reactions, and fail to provide such reaction trees, limiting models' understanding of organic molecule transformations. In this work, we first develop a benchmark curated for the retrosynthetic planning task, which consists of 124,869 reaction trees retrieved from the public USPTO-full dataset. On top of that, we propose Metro: Memory-Enhanced Transformer for RetrOsynthetic planning. Specifically, the dependency among molecules in the reaction tree is captured as context information for multi-step retrosynthesis predictions through transformers with a memory module. Extensive experiments show that Metro dramatically outperforms existing single-step retrosynthesis models by at least 10.7% in top-1 accuracy. The experiments demonstrate the superiority of exploiting context information in the retrosynthetic planning task. Moreover, the proposed model can be directly used for synthetic accessibility analysis, as it is trained on reaction trees with the shortest depths. Our work is the first step towards a brand new formulation for retrosynthetic planning in the aspects of data construction, model design, and evaluation. Code is available at https://github.com/SongtaoLiu0823/metro.
翻译:重新合成规划在药物发现和有机化学中起着关键作用。 从根节点这个目标分子开始, 我们的目标是找到完整的反应树, 其限制是所有叶节都属于一组起始材料。 多步反应至关重要, 因为它们决定了有机化学工业生产过程中的流程图。 然而, 现有的数据集缺乏树结构多步反应的曲线, 并且无法提供这种反应树, 限制了模型对有机分子变异的理解。 在这项工作中, 我们首先为回综合规划任务开发了一个基准, 由124, 869个从公共的 USPTO- full数据集中回收的响应树组成。 我们提议Metro: 内存- 强化的变异器用于 RetrOsynthactical 规划。 具体来说, 反应树中的分子之间的依赖性被记录为通过记忆模块变异器进行多步回现后合成预测的背景信息。 288 广泛实验显示Metrodeal 明显超出现有的单步变后合成模型。 由公共的 USPTO- full数据检索模型组成, 在最高级的模型中, 10.7- 1 正在直接地利用了我们的任务构建中, 的系统变型变形分析。