Molecules 和自然语言之间的翻译 (Translation between Molecules and Natural Language) - 专知论文

会员服务 ·

0

MoDELS · CASES · 未标记 · Learning · 讲稿 ·

2022 年 11 月 3 日

Translation between Molecules and Natural Language

翻译：Molecules 和自然语言之间的翻译

Carl Edwards,Tuan Lai,Kevin Ros,Garrett Honke,Kyunghyun Cho,Heng Ji

from arxiv, Accepted at EMNLP 2022. Data and code can be found on [Github](https://github.com/blender-nlp/MolT5)

We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.

翻译：$\ textbf{MolT5} 我们首次探索了 $\ textbf{MolT5} $- $- 美元用于大量的无标签自然语言文本和分子字符的预培训模型的自监督学习框架。$\ textbf{MolT5} 美元允许使用新的、有用和具有挑战性的传统视觉语言任务类比, 如分子字幕和基于文本的脱新分子生成( 包括分子和语言之间的翻译) 。由于 $\ textbf{MolT5} 美元的单调数据前列模型, 它帮助克服了数据稀缺的化学领域缺陷。此外, 我们考虑了一些衡量尺度, 包括新的跨模式嵌入式嵌入式模型, 来评估分子字幕和基于文本的分子生成任务。我们的结果表明, $\ textbf{MolT5} $ 基模型能够产生出很多情况下质量都很高的分子和字幕产出。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

助剂与工业纳米颗粒对手性农药在土壤中吸附和降解的影响机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于基因-蛋白质-代谢物调控网络的极端微生物耐辐射分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

胃癌转移相关microRNA的筛选及调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

污染土壤重金属生物有效性与蚯蚓生态毒性效应关系研究

国家自然科学基金

0+阅读 · 2010年12月31日

土壤弹尾目昆虫重金属排泄解毒机制和分子遗传标志物

国家自然科学基金

0+阅读 · 2009年12月31日

SHH信号通路调控人前列腺癌细胞EMT转化的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

果蝇CG12765基因功能和分子机制的研究

国家自然科学基金

0+阅读 · 2008年12月31日

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

Arxiv

0+阅读 · 2022年12月27日

What Do Compressed Multilingual Machine Translation Models Forget?

Arxiv

0+阅读 · 2022年12月27日

Graph Learning: A Comprehensive Survey and Future Directions

Arxiv

1+阅读 · 2022年12月17日

Molecule Generation by Principal Subgraph Mining and Assembling

Arxiv

0+阅读 · 2022年12月17日

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

Arxiv

0+阅读 · 2022年12月17日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Arxiv

28+阅读 · 2022年9月30日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

Arxiv

0+阅读 · 2022年12月27日

What Do Compressed Multilingual Machine Translation Models Forget?

Arxiv

0+阅读 · 2022年12月27日

Graph Learning: A Comprehensive Survey and Future Directions

Arxiv

1+阅读 · 2022年12月17日

Molecule Generation by Principal Subgraph Mining and Assembling

Arxiv

0+阅读 · 2022年12月17日

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

Arxiv

0+阅读 · 2022年12月17日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Arxiv

28+阅读 · 2022年9月30日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

助剂与工业纳米颗粒对手性农药在土壤中吸附和降解的影响机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于基因-蛋白质-代谢物调控网络的极端微生物耐辐射分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

胃癌转移相关microRNA的筛选及调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

污染土壤重金属生物有效性与蚯蚓生态毒性效应关系研究

国家自然科学基金

0+阅读 · 2010年12月31日

土壤弹尾目昆虫重金属排泄解毒机制和分子遗传标志物

国家自然科学基金

0+阅读 · 2009年12月31日

SHH信号通路调控人前列腺癌细胞EMT转化的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

果蝇CG12765基因功能和分子机制的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员