Linguistically inspired roadmap for building biologically reliable protein language models - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Guidance · 词元化 · Integration ·

2023 年 4 月 28 日

Linguistically inspired roadmap for building biologically reliable protein language models

翻译：暂无翻译

Mai Ha Vu,Rahmad Akbar,Philippe A. Robert,Bartlomiej Swiatczak,Victor Greiff,Geir Kjetil Sandve,Dag Trygve Truslew Haug

from arxiv, 27 pages, 4 figures

Deep neural-network-based language models (LMs) are increasingly applied to large-scale protein sequence data to predict protein function. However, being largely black-box models and thus challenging to interpret, current protein LM approaches do not contribute to a fundamental understanding of sequence-function mappings, hindering rule-based biotherapeutic drug development. We argue that guidance drawn from linguistics, a field specialized in analytical rule extraction from natural language data, can aid with building more interpretable protein LMs that are more likely to learn relevant domain-specific rules. Differences between protein sequence data and linguistic sequence data require the integration of more domain-specific knowledge in protein LMs compared to natural language LMs. Here, we provide a linguistics-based roadmap for protein LM pipeline choices with regard to training data, tokenization, token embedding, sequence embedding, and model interpretation. Incorporating linguistic ideas into protein LMs enables the development of next-generation interpretable machine-learning models with the potential of uncovering the biological mechanisms underlying sequence-function relationships.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

茄青枯拉尔氏菌Rs-T02编码精氨酸脱亚胺酶基因的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

一种高效诱导广谱中和抗体的丙型肝炎疫苗的研究

国家自然科学基金

0+阅读 · 2012年12月31日

流形上的几何与分析

国家自然科学基金

0+阅读 · 2011年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

羊痘病毒ORFV119蛋白与宿主细胞相互作用的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

巨噬细胞特异表达VSIG4抑制T细胞活化在调节动脉粥样硬化斑块形成及稳定中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

从类胡萝卜素特异结合蛋白提取纯化研究动物富集类胡萝卜素机理

国家自然科学基金

0+阅读 · 2009年12月31日

基于电磁拓扑的高功率电磁环境下电子系统耦合参数的分析与计算

国家自然科学基金

0+阅读 · 2008年12月31日

分数阶微分方程的数值计算和动力学行为

国家自然科学基金

0+阅读 · 2008年12月31日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年6月13日

Leveraging Skill-to-Skill Supervision for Knowledge Tracing

Arxiv

0+阅读 · 2023年6月12日

Predicting Software Performance with Divide-and-Learn

Arxiv

0+阅读 · 2023年6月11日

Universal Language Modelling agent

Arxiv

0+阅读 · 2023年6月10日

Can Large Language Models Infer Causation from Correlation?

Arxiv

5+阅读 · 2023年6月9日

A Roadmap for Big Model

Arxiv

76+阅读 · 2022年3月26日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

21+阅读 · 2019年5月13日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

【博士论文】快速高效的归一化流及其在图像生成模型中的应用

仿生机器人技术的军事应用

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年6月13日

Leveraging Skill-to-Skill Supervision for Knowledge Tracing

Arxiv

0+阅读 · 2023年6月12日

Predicting Software Performance with Divide-and-Learn

Arxiv

0+阅读 · 2023年6月11日

Universal Language Modelling agent

Arxiv

0+阅读 · 2023年6月10日

Can Large Language Models Infer Causation from Correlation?

Arxiv

5+阅读 · 2023年6月9日

A Roadmap for Big Model

Arxiv

76+阅读 · 2022年3月26日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

21+阅读 · 2019年5月13日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

茄青枯拉尔氏菌Rs-T02编码精氨酸脱亚胺酶基因的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

一种高效诱导广谱中和抗体的丙型肝炎疫苗的研究

国家自然科学基金

0+阅读 · 2012年12月31日

流形上的几何与分析

国家自然科学基金

0+阅读 · 2011年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

羊痘病毒ORFV119蛋白与宿主细胞相互作用的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

巨噬细胞特异表达VSIG4抑制T细胞活化在调节动脉粥样硬化斑块形成及稳定中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

从类胡萝卜素特异结合蛋白提取纯化研究动物富集类胡萝卜素机理

国家自然科学基金

0+阅读 · 2009年12月31日

基于电磁拓扑的高功率电磁环境下电子系统耦合参数的分析与计算

国家自然科学基金

0+阅读 · 2008年12月31日

分数阶微分方程的数值计算和动力学行为

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员