探索进化和免费蛋白蛋白语言模型,作为蛋白质功能预测器 (Exploring evolution-based & -free protein language models as protein function predictors) - 专知论文

会员服务 ·

0

语言模型化 · AlphaFold · 预测器/决策函数 · 泛函 · MoDELS ·

2022 年 6 月 14 日

Exploring evolution-based & -free protein language models as protein function predictors

翻译：探索进化和免费蛋白蛋白语言模型,作为蛋白质功能预测器

Mingyang Hu,Fajie Yuan,Kevin K. Yang,Fusong Ju,Jin Su,Hui Wang,Fei Yang,Qiuyang Ding

Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions. In particular, AlphaFold, a ground-breaking AI system, could potentially reshape structural biology. However, the utility of the PLM module in AlphaFold, Evoformer, has not been explored beyond structure prediction. In this paper, we investigate the representation ability of three popular PLMs: ESM-1b (single sequence), MSA-Transformer (multiple sequence alignment) and Evoformer (structural), with a special focus on Evoformer. Specifically, we aim to answer the following key questions: (\romannumeral1) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function? (\romannumeral2) If yes, can Evoformer replace ESM-1b and MSA-Transformer? (\romannumeral3) How much do these PLMs rely on evolution-related protein data? In this regard, are they complementary to each other? We compare these models by empirical study along with new insights and conclusions.Finally, we release code and datasets for reproducibility.

翻译：大规模蛋白语言模型(PLM)提高了蛋白质预测任务(从3D结构预测到各种功能预测等)的性能。特别是,创世的AI系统AlphaFold有可能改变结构生物学。然而,Evoforold 的PLM模块在结构预测之外尚未探索其效用。在本文中,我们调查了三种流行的PLM:ESM-1b(单一序列)、MIS-Transector (多重序列对齐)和Evofrent(结构)的代表性。在这方面,这些PLMS(多重序列对齐)和Evoreforent(结构),特别侧重于Evoexer。具体地说,我们的目标是回答以下关键问题:(romannupholal1)作为Alphord的一部分受过培训的EvoForold是否产生可以预测蛋白功能的表象? (\romannual2) 如果是的,Evoexterf 取代MS-1b(单一序列)和MIS-Transecterent? (\mannual3)这些PLMs 在多大程度上依赖于与进化相关的蛋白数据?在这方面,它们是否互相补充?我们通过实验性研究将这些模型与新的见解和结论比较这些模型。

0

相关内容

语言模型化

语言模型化

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【哈佛大学】使用AlphaFold估算蛋白质模型精度的最新技术，State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

【哈佛大学】使用AlphaFold估算蛋白质模型精度的最新技术，State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

专知会员服务

21+阅读 · 2022年3月14日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

KDD2021 | 最新GNN官方教程

KDD2021 | 最新GNN官方教程

机器学习与推荐算法

2+阅读 · 2021年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

水合团簇离子的从头算分子动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于多相流LBM-VOF联合模式的水泵进水池漩涡流研究

国家自然科学基金

0+阅读 · 2014年12月31日

高温高压下典型IV族氧化物MO2(M=C,Si)的相变边界行为与性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

自然界中存在m=292超重元素之谜的实验探索

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质折叠和功能性变构机理的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

钌氧化物关联电子材料的金属-绝缘体转变

国家自然科学基金

0+阅读 · 2008年12月31日

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Arxiv

1+阅读 · 2022年8月1日

Inter-model Interpretability: Self-supervised Models as a Case Study

Arxiv

0+阅读 · 2022年7月31日

Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation

Arxiv

0+阅读 · 2022年7月30日

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure

Arxiv

0+阅读 · 2022年7月28日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

VIP会员

文章信息

相关主题

语言模型化

预测器/决策函数

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【哈佛大学】使用AlphaFold估算蛋白质模型精度的最新技术，State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

【哈佛大学】使用AlphaFold估算蛋白质模型精度的最新技术，State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

专知会员服务

21+阅读 · 2022年3月14日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

KDD2021 | 最新GNN官方教程

KDD2021 | 最新GNN官方教程

机器学习与推荐算法

2+阅读 · 2021年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Arxiv

1+阅读 · 2022年8月1日

Inter-model Interpretability: Self-supervised Models as a Case Study

Arxiv

0+阅读 · 2022年7月31日

Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation

Arxiv

0+阅读 · 2022年7月30日

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure

Arxiv

0+阅读 · 2022年7月28日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

相关基金

水合团簇离子的从头算分子动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于多相流LBM-VOF联合模式的水泵进水池漩涡流研究

国家自然科学基金

0+阅读 · 2014年12月31日

高温高压下典型IV族氧化物MO2(M=C,Si)的相变边界行为与性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

自然界中存在m=292超重元素之谜的实验探索

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质折叠和功能性变构机理的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

钌氧化物关联电子材料的金属-绝缘体转变

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员