Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
翻译:化学反应预测,包括前期合成和反转合成预测,是有机合成的一个根本问题。流行的计算模式将合成预测作为一种序列到顺序的翻译问题,对分子表示采用典型的SMILES。然而,通用SMILES忽略了化学反应的特性,分子图示表层基本上没有从反应剂向产品转变,因此如果直接应用的话,SMILES的性能不尽人意。我们在本篇文章中提议了根整齐的SMILES(R-SMILES),它规定了产品与反应剂SMILES之间的一对一的精确匹配绘图,以便更高效的合成预测。由于严格的一对一的绘图和缩短编辑距离,计算模型基本上从学习复杂的语法和专门学习反应的化学知识中解脱脱去。我们把拟议的R-SMILES与各种最新基准进行比较,并表明它大大超越了它们,显示了拟议方法的优越性。