注意回溯性综合差距:弥合单步和多步再综合预测之间的鸿沟 (Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction)

Retrosynthesis is the task of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found. Consequently, the goal is to provide a valid synthesis route for a molecule. As more single-step models develop, we see increasing accuracy in the prediction of molecular disconnections, potentially improving the creation of synthetic paths. Multi-step approaches repeatedly apply the chemical information stored in single-step retrosynthesis models. However, this connection is not reflected in contemporary research, fixing either the single-step model or the multi-step algorithm in the process. In this work, we establish a bridge between both tasks by benchmarking the performance and transfer of different single-step retrosynthesis models to the multi-step domain by leveraging two common search algorithms, Monte Carlo Tree Search and Retro*. We show that models designed for single-step retrosynthesis, when extended to multi-step, can have a tremendous impact on the route finding capabilities of current multi-step methods, improving performance by up to +30% compared to the most widely used model. Furthermore, we observe no clear link between contemporary single-step and multi-step evaluation metrics, showing that single-step models need to be developed and tested for the multi-step domain and not as an isolated task to find synthesis routes for molecules of interest.

翻译：复制合成是打破化学化合物的任务,在发现一组商业上可用的分子之前,逐渐逐步将化学化合物分解为分子前体,直到发现一组分子。因此,目标是为分子提供一个有效的合成路径。随着更多的单步模型的发展,我们看到分子脱节预测的准确性不断提高,从而有可能改进合成路径的创造。多步方法反复应用以单步回溯合成模型储存的化学信息。然而,这一联系没有反映在当代研究中,既确定了单步模型,又确定了该过程中的多步算法。在这项工作中,我们通过对不同单步反向合成模型的性能和向多步域的转移进行基准设定,我们通过利用两个共同的搜索算法 -- -- Monte Car树搜索和Retro*,我们看到,在单步复联模式扩展为多步模式时,能够对寻找当前多步方法的能力产生巨大影响,改进性能,比最广泛使用的模型高至+30 %。此外,我们没有看到将不同的单步反向多步化模型的多步模式建立明确的多步路。我们没有看到,没有看到一个明确的当代单一单步化模型,没有找到单步式的多步式的多步式,没有找到的多步式。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日