AMOM: 有条件的蒙面语言模版遮罩的适应性遮罩</s> (AMOM: Adaptive Masking over Masking for Conditional Masked Language Model) - 专知论文

会员服务 ·

0

Performer · 掩码 · 掩码语言模型化 · 语言模型化 · BLEU ·

2023 年 3 月 13 日

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

翻译：AMOM: 有条件的蒙面语言模版遮罩的适应性遮罩

Yisheng Xiao,Ruiyang Xu,Lijun Wu,Juntao Li,Tao Qin,Yan-Tie Liu,Min Zhang

from arxiv, Accepted by AAAI2023

Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on \textbf{3} different tasks (neural machine translation, summarization, and code generation) with \textbf{15} datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (\textbf{34.62} BLEU on WMT16 EN$\to$RO, \textbf{34.82} BLEU on WMT16 RO$\to$EN, and \textbf{34.84} BLEU on IWSLT De$\to$En) and even better performance than the \textbf{AR} Transformer on \textbf{7} benchmark datasets with at least \textbf{2.2$\times$} speedup. Our code is available at GitHub.

翻译：以变换器为基础的自动递增( AR) 方法在各种序列生成任务中取得了吸引人的性能, 例如神经机器翻译、合成和代码生成等, 但却受到低推断效率的影响。为了加快推断阶段, 在过去几年里提出了许多非自动递增( NAR) 战略。其中, 有条件的隐含语言模式( CMLM) 是最通用的框架之一, 因为它可以支持许多不同的序列生成情景, 并实现这些任务上非常有竞争力的性能。在本文中, 我们还引入了一个简单而有效的掩罩策略, 以提高解码机的精细化能力, 并使编码优化变得更容易。在\ textbfff{15} 数据库中, 有条件的隐含语言模式( CMLM) 能够支持许多不同的序列生成显著的性能改进。值得注意的是, 我们提议的模型在 NEOF$( $) 和 NEUFMT 上生成了比 NEULF 更高级的性能。</s>

0

相关内容

Performer

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

专知会员服务

24+阅读 · 2022年9月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

新基因NDRG2及其蛋白产物Ndrg2磷酸化修饰在心肌缺血/再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

小麦太谷核不育基因Ms2的图位克隆

国家自然科学基金

0+阅读 · 2014年12月31日

电休克刺激对SD大鼠脑中枢神经系统可塑性的影响

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-455-5p介导的ADMA代谢和转运网络调控在内皮损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

基于多示例学习的多模态信息表达与推荐方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用MAGIC群体解析陆地棉重要经济性状的遗传基础

国家自然科学基金

0+阅读 · 2012年12月31日

新型STAT3信号通路抑制剂KT53504构效关系和抗肿瘤分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PTPRR-ERK介导的神经可塑性在抑郁症发生发展中的作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR信号通路调控ROCK1翻译在糖尿病认知功能障碍发病机制中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

Masked Trajectory Models for Prediction, Representation, and Control

Arxiv

0+阅读 · 2023年5月4日

SLTUNET: A Simple Unified Model for Sign Language Translation

Arxiv

0+阅读 · 2023年5月2日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

专知会员服务

24+阅读 · 2022年9月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Masked Trajectory Models for Prediction, Representation, and Control

Arxiv

0+阅读 · 2023年5月4日

SLTUNET: A Simple Unified Model for Sign Language Translation

Arxiv

0+阅读 · 2023年5月2日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

相关基金

新基因NDRG2及其蛋白产物Ndrg2磷酸化修饰在心肌缺血/再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

小麦太谷核不育基因Ms2的图位克隆

国家自然科学基金

0+阅读 · 2014年12月31日

电休克刺激对SD大鼠脑中枢神经系统可塑性的影响

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-455-5p介导的ADMA代谢和转运网络调控在内皮损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

基于多示例学习的多模态信息表达与推荐方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用MAGIC群体解析陆地棉重要经济性状的遗传基础

国家自然科学基金

0+阅读 · 2012年12月31日

新型STAT3信号通路抑制剂KT53504构效关系和抗肿瘤分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PTPRR-ERK介导的神经可塑性在抑郁症发生发展中的作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR信号通路调控ROCK1翻译在糖尿病认知功能障碍发病机制中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员