N-grammer: 增强具有潜伏n克的变形器 (N-Grammer: Augmenting Transformers with latent n-grams) - 专知论文

会员服务 ·

0

变换 · MoDELS · N元 · 语言模型化 · 可辨认的 ·

2022 年 7 月 13 日

N-Grammer: Augmenting Transformers with latent n-grams

翻译：N-grammer: 增强具有潜伏n克的变形器

Aurko Roy,Rohan Anil,Guangda Lai,Benjamin Lee,Jeffrey Zhao,Shuyuan Zhang,Shibo Wang,Ye Zhang,Shen Wu,Rigel Swavely, Tao, Yu,Phuong Dao,Christopher Fifty,Zhifeng Chen,Yonghui Wu

from arxiv, 8 pages, 2 figures

Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer. We open-source our model for reproducibility purposes in Jax.

翻译：最近,作为自然语言处理的基础模型之一,作为副产品,变换模型最近成为自然语言处理的基础模型之一,最近对这些模型的推广产生了很大的兴趣和投资。然而,这些大型变换语言模型的培训和推论成本令人望而却步,因此有必要进行更多的研究,以确定效率更高的变体。在这项工作中,我们提议对由统计语言建模文献所启发的变换结构进行简单而有效的修改,办法是通过从文本序列的离散潜在代表形式中构建的正克来扩大模型。我们评估了我们的模型,即C4数据集的语言建模以及超级GLUE数据集的文本分类的N-Grammer,并发现它超越了诸如变换器和总理等几个强有力的基准。我们为Jax的再生用途开发了我们的模型。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

铁基超导体中自旋轨道耦合效应的微观理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深度测序和Hi-C方法对染色体层次上基因表达调控机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Slack(KCNT1)通道癫痫相关突变体在额叶癫痫发作中作用机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

InGaAs失配体系材料的界面特性及缺陷形成机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CITED2在心脏干细胞衰老中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

椿皮中苦木内酯类成分抑制HER2的作用机制和构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

限制性定理、谱乘子及其相关问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Analyzing Transformers in Embedding Space

Arxiv

0+阅读 · 2022年9月6日

DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation

Arxiv

0+阅读 · 2022年9月6日

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Arxiv

0+阅读 · 2022年9月6日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Graph Transformer Networks

Arxiv

15+阅读 · 2020年2月5日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

相关论文

Analyzing Transformers in Embedding Space

Arxiv

0+阅读 · 2022年9月6日

DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation

Arxiv

0+阅读 · 2022年9月6日

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Arxiv

0+阅读 · 2022年9月6日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Graph Transformer Networks

Arxiv

15+阅读 · 2020年2月5日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

铁基超导体中自旋轨道耦合效应的微观理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深度测序和Hi-C方法对染色体层次上基因表达调控机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Slack(KCNT1)通道癫痫相关突变体在额叶癫痫发作中作用机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

InGaAs失配体系材料的界面特性及缺陷形成机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CITED2在心脏干细胞衰老中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

椿皮中苦木内酯类成分抑制HER2的作用机制和构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

限制性定理、谱乘子及其相关问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员