蛋白质:蛋白质预科培训,内嵌基因本体学 (OntoProtein: Protein Pretraining With Gene Ontology Embedding) - 专知论文

会员服务 ·

0

语言模型化 · Better · 图 · MoDELS · 知识图谱 ·

2022 年 2 月 15 日

OntoProtein: Protein Pretraining With Gene Ontology Embedding

翻译：蛋白质:蛋白质预科培训,内嵌基因本体学

Ningyu Zhang,Zhen Bi,Xiaozhuan Liang,Siyuan Cheng,Haosen Hong,Shumin Deng,Jiazhang Lian,Qiang Zhang,Huajun Chen

from arxiv, Accepted by ICLR 2022

Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.

翻译：自我监督的蛋白质语言模型已证明了它们在学习蛋白质表示方式方面的有效性。随着计算能力的不断增强,当前蛋白语言模型在数以百万计的不同序列中预先培训的蛋白语言模型能够将参数比例从百万级升至十亿级,并取得显著的改进。然而,这些流行的方法很少考虑纳入知识图表(KGs),这些图表可以为更好的蛋白质表示提供丰富的结构化知识事实。我们争辩说,KGs的知情生物学知识可以用外部知识来增强蛋白质代表形式。在这项工作中,我们提议OntoProtein,这是第一个将GO(Gene Ontology)结构用于蛋白学预培训前模型的总框架。我们建造了一个由GO及其相关蛋白质组成的新型大规模知识图表,以及基因说明文本或蛋白质序列描述了图表中的所有节点。我们提议采用新的对比性学习方法,通过知识认知负面的抽样来共同优化知识图表和蛋白质在培训前的嵌入。实验结果表明,Ontotein可以超过经过事先训练的蛋白质语言模型模型模型模型(Gen-Propractimus)的状态和蛋白质/Proprojustimus 和蛋白质/demaint 的基质/dealdealsideprealsreals。

6

相关内容

语言模型化

语言模型化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

43+阅读 · 2022年3月4日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知会员服务

40+阅读 · 2022年2月28日

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

专知会员服务

29+阅读 · 2022年2月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

249+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【百度开源2019年新型冠状病毒RNA预测算法】Baidu Open-Sources RNA Prediction Algorithm for 2019 Novel Coronavirus

【百度开源2019年新型冠状病毒RNA预测算法】Baidu Open-Sources RNA Prediction Algorithm for 2019 Novel Coronavirus

专知会员服务

26+阅读 · 2020年2月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

专知

1+阅读 · 2022年2月20日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

氧化石墨烯与蛋白质相互作用的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

MiR-133互作lncRNAs的鉴定及协同调控牛肌肉发育分化的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

DACI1 调控Cyt b6/f 复合物组装的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

癌症基因组测序分析鉴定驱动基因及其路径的方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

猪GnIH与GnRH介导的细胞内信号通路交联节点调控促性腺激素转录的机制

国家自然科学基金

0+阅读 · 2013年12月31日

超支化高分子拓扑结构精确表征和可逆共价化学调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

液泡加工酶调控苹果根系细胞死亡与根类分化的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

光基因调控脊髓损伤小鼠步行CPG研究

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年4月20日

Generating 3D Molecules for Target Protein Binding

Arxiv

0+阅读 · 2022年4月19日

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

Vision-and-Language Pretrained Models: A Survey

Vision-and-Language Pretrained Models: A Survey

Arxiv

3+阅读 · 2022年4月15日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

Heterogeneous Deep Graph Infomax

Heterogeneous Deep Graph Infomax

Arxiv

12+阅读 · 2019年11月19日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

43+阅读 · 2022年3月4日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知会员服务

40+阅读 · 2022年2月28日

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

专知会员服务

29+阅读 · 2022年2月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

249+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【百度开源2019年新型冠状病毒RNA预测算法】Baidu Open-Sources RNA Prediction Algorithm for 2019 Novel Coronavirus

【百度开源2019年新型冠状病毒RNA预测算法】Baidu Open-Sources RNA Prediction Algorithm for 2019 Novel Coronavirus

专知会员服务

26+阅读 · 2020年2月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军人体数字孪生研究：历史、现状与未来

CVPR 2025 | 自动化所新作速览（二）

俄乌战争上周战况（3月28日）

从技术突破到场景落地：大模型发展图谱与DeepSeek创新应用

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

ICLR2022 | OntoProtein：融入基因本体知识的蛋白质预训练

专知

1+阅读 · 2022年2月20日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

相关论文

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年4月20日

Generating 3D Molecules for Target Protein Binding

Arxiv

0+阅读 · 2022年4月19日

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

Vision-and-Language Pretrained Models: A Survey

Vision-and-Language Pretrained Models: A Survey

Arxiv

3+阅读 · 2022年4月15日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

Heterogeneous Deep Graph Infomax

Heterogeneous Deep Graph Infomax

Arxiv

12+阅读 · 2019年11月19日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

相关基金

氧化石墨烯与蛋白质相互作用的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

MiR-133互作lncRNAs的鉴定及协同调控牛肌肉发育分化的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

DACI1 调控Cyt b6/f 复合物组装的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

癌症基因组测序分析鉴定驱动基因及其路径的方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

猪GnIH与GnRH介导的细胞内信号通路交联节点调控促性腺激素转录的机制

国家自然科学基金

0+阅读 · 2013年12月31日

超支化高分子拓扑结构精确表征和可逆共价化学调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

液泡加工酶调控苹果根系细胞死亡与根类分化的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

光基因调控脊髓损伤小鼠步行CPG研究

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员