LaMememo:用直视记忆模拟语言 (LaMemo: Language Modeling with Look-Ahead Memory) - 专知论文

会员服务 ·

0

语言模型化 · INFORMS · MoDELS · INTERACT · 隐状态 ·

2022 年 4 月 26 日

LaMemo: Language Modeling with Look-Ahead Memory

翻译：LaMememo:用直视记忆模拟语言

Haozhe Ji,Rongsheng Zhang,Zhenyu Yang,Zhipeng Hu,Minlie Huang

from arxiv, Accepted by NAACL 2022

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

翻译：虽然具有完全连接的自我意识的变异器对于模拟长期依赖关系非常强大,但是它们正努力在语言模型中将长文本缩到数千字的长文本中。其中一个解决办法是使模型具有重复记忆。但是, 现有的方法直接再利用前一个部分的隐藏状态, 以单向方式将背景编码。因此, 禁止记忆与当前为象征性预测提供最新信息的背景动态互动。为了纠正这一问题, 我们提议 Look- Ahead Memory (LaMemo), 逐步关注右侧的符号, 并与旧记忆状态相互推介, 以保持历史中的长期信息, 从而增强重复记忆的记忆。 LaMemo 包含双向关注和段重现, 并额外计算与记忆长度成直线成正比的顶端。对广泛使用的语言模型基准的实验表明它优于配备不同记忆的基线。

0

相关内容

语言模型化

语言模型化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于虚拟点探测的压缩感知快速活体光声成像方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

考虑边点毁伤传播互联网自愈机理与鲁棒度量

国家自然科学基金

1+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

E3泛素连接酶CHIP在前列腺癌雄激素非依赖性形成中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

贵金属单层核壳结构纳米电催化剂研究

国家自然科学基金

0+阅读 · 2009年12月31日

CD73在动脉粥样硬化斑块破裂中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Arxiv

0+阅读 · 2022年6月13日

PET: An Annotated Dataset for Process Extraction from Natural Language Text

Arxiv

0+阅读 · 2022年6月13日

Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Arxiv

0+阅读 · 2022年6月10日

Predicting Corporate Risk by Jointly Modeling Company Networks and Dialogues in Earnings Conference Calls

Arxiv

0+阅读 · 2022年5月25日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

相关论文

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Arxiv

0+阅读 · 2022年6月13日

PET: An Annotated Dataset for Process Extraction from Natural Language Text

Arxiv

0+阅读 · 2022年6月13日

Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Arxiv

0+阅读 · 2022年6月10日

Predicting Corporate Risk by Jointly Modeling Company Networks and Dialogues in Earnings Conference Calls

Arxiv

0+阅读 · 2022年5月25日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

相关基金

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于虚拟点探测的压缩感知快速活体光声成像方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

考虑边点毁伤传播互联网自愈机理与鲁棒度量

国家自然科学基金

1+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

E3泛素连接酶CHIP在前列腺癌雄激素非依赖性形成中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

贵金属单层核壳结构纳米电催化剂研究

国家自然科学基金

0+阅读 · 2009年12月31日

CD73在动脉粥样硬化斑块破裂中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员