Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO {database} are given as candidates, our RetCL achieves top-1 exact match accuracy of $71.3\%$ for the USPTO-50k benchmark, while a recent transformer-based approach achieves $59.6\%$. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches.
翻译:在本文中,我们提出一种新的方法,通过重新配置回溯合成模板,将问题简化为一组综合目标产品,这是一个新兴的深层学习研究领域。虽然现有方法已经显示出令人乐观的结果,但目前它们缺乏能力来考虑反应器的可用性(如稳定性或可购买性),或推广到看不见的反应模板(如化学反应规则)。在本文件中,我们提出一种新的方法,通过重新配置回溯合成,将问题简化为从一组商业可用分子候选人中选择反应者的选择问题。为此,我们设计了一个高效反应器选择框架,名为RetCL(通过对比学习重新合成),用于根据图表神经网络计算的选择分数来计算所有候选分子。为了学习分数,我们还提出了一个新的对比性培训计划,其中的反射效果非常强。广泛的实验显示了基于选择的方法的好处。例如,当USPTO {Database的所有671反应器反应器都作为候选人提供,我们最新的RetCLQ_1,同时我们的最新的RetCL_1在US 实现总的精确度中为7-CLA的RQ。