根基 SMILES:化学反应预测的严密代表 (Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction) - 专知论文

会员服务 ·

0

可约的 · 学成 · 知识 (knowledge) · Performer · state-of-the-art ·

2022 年 5 月 4 日

Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction

翻译：根基 SMILES:化学反应预测的严密代表

Zipeng Zhong,Jie Song,Zunlei Feng,Tiantao Liu,Lingxiang Jia,Shaolun Yao,Min Wu,Tingjun Hou,Mingli Song

from arxiv, Main paper: 23 pages, 5 figures, and 3 table; supplementary information: 12 pages, 5 figures and 2 tables. Code repository: https://github.com/otori-bird/retrosynthesis

Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.

翻译：化学反应预测,包括前期合成和反转合成预测,是有机合成的一个根本问题。流行的计算模式将合成预测作为一种序列到顺序的翻译问题,对分子表示采用典型的SMILES。然而,通用SMILES忽略了化学反应的特性,分子图示表层基本上没有从反应剂向产品转变,因此如果直接应用的话,SMILES的性能不尽人意。我们在本篇文章中提议了根整齐的SMILES(R-SMILES),它规定了产品与反应剂SMILES之间的一对一的精确匹配绘图,以便更高效的合成预测。由于严格的一对一的绘图和缩短编辑距离,计算模型基本上从学习复杂的语法和专门学习反应的化学知识中解脱脱去。我们把拟议的R-SMILES与各种最新基准进行比较,并表明它大大超越了它们,显示了拟议方法的优越性。

0

相关内容

可约的

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

氢化反应制备轻金属硅化物的反应机理及其结构和电化学性能

国家自然科学基金

0+阅读 · 2014年12月31日

雾霾环境下城市机动车污染物扩散的时空动态特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

sHLA-G调控MSCs治疗SLE的疗效及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于改进的Co-Kriging模型的高维气动优化设计新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于软硬骨组织生物力学特性的机器人辅助颈椎间盘置换手术磨削机理

国家自然科学基金

0+阅读 · 2012年12月31日

高超声速钻地弹对混凝土靶体的毁伤机理及工程防护研究

国家自然科学基金

2+阅读 · 2012年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

Learning to Estimate and Refine Fluid Motion with Physical Dynamics

Arxiv

0+阅读 · 2022年6月22日

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Arxiv

0+阅读 · 2022年6月21日

Improving numerical accuracy for the viscous-plastic formulation of sea ice

Arxiv

0+阅读 · 2022年6月21日

Modelling Populations of Interaction Networks via Distance Metrics

Arxiv

0+阅读 · 2022年6月20日

Actively learning to learn causal relationships

Arxiv

0+阅读 · 2022年6月20日

Abstraction-Based Segmental Simulation of Chemical Reaction Networks

Arxiv

0+阅读 · 2022年6月18日

TLETA: Deep Transfer Learning and Integrated Cellular Knowledge for Estimated Time of Arrival Prediction

Arxiv

0+阅读 · 2022年6月17日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

知识 (knowledge)

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Learning to Estimate and Refine Fluid Motion with Physical Dynamics

Arxiv

0+阅读 · 2022年6月22日

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Arxiv

0+阅读 · 2022年6月21日

Improving numerical accuracy for the viscous-plastic formulation of sea ice

Arxiv

0+阅读 · 2022年6月21日

Modelling Populations of Interaction Networks via Distance Metrics

Arxiv

0+阅读 · 2022年6月20日

Actively learning to learn causal relationships

Arxiv

0+阅读 · 2022年6月20日

Abstraction-Based Segmental Simulation of Chemical Reaction Networks

Arxiv

0+阅读 · 2022年6月18日

TLETA: Deep Transfer Learning and Integrated Cellular Knowledge for Estimated Time of Arrival Prediction

Arxiv

0+阅读 · 2022年6月17日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

氢化反应制备轻金属硅化物的反应机理及其结构和电化学性能

国家自然科学基金

0+阅读 · 2014年12月31日

雾霾环境下城市机动车污染物扩散的时空动态特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

sHLA-G调控MSCs治疗SLE的疗效及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于改进的Co-Kriging模型的高维气动优化设计新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于软硬骨组织生物力学特性的机器人辅助颈椎间盘置换手术磨削机理

国家自然科学基金

0+阅读 · 2012年12月31日

高超声速钻地弹对混凝土靶体的毁伤机理及工程防护研究

国家自然科学基金

2+阅读 · 2012年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员