重新审视简单神经能动语言模型 (Revisiting Simple Neural Probabilistic Language Models) - 专知论文

会员服务 ·

0

语言模型化 · 神经概率语言模型 · Perplexity · MoDELS · SimPLe ·

2021 年 4 月 8 日

Revisiting Simple Neural Probabilistic Language Models

翻译：重新审视简单神经能动语言模型

Simeng Sun,Mohit Iyyer

from arxiv, To appear at NAACL 2021

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of~\citet{Bengio2003ANP}, which simply concatenates word embeddings within a fixed window and passes the result through a feed-forward network to predict the next word. When scaled up to modern hardware, this model (despite its many limitations) performs much better than expected on word-level language model benchmarks. Our analysis reveals that the NPLM achieves lower perplexity than a baseline Transformer with short input contexts but struggles to handle long-term dependencies. Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer, which results in small but consistent perplexity decreases across three word-level language modeling datasets.

翻译：语言建模方面最近的进展不仅受到神经结构进步的驱动,而且受到硬件和优化改进的驱动。在本文中,我们重新审视了“citet{Bengio2003ANP}”的神经概率语言模型(NPLM),该模型将单词嵌入一个固定窗口,并通过一个反馈前网络将结果传递到下一个词的预测。在升级到现代硬件时,该模型(尽管有许多局限性)在字级语言模型基准上的表现比预期的要好得多。我们的分析显示,NPLM比一个有短期输入背景但难以处理长期依赖性的基线变异器(NPLM)实现的更低的多。受此结果的启发,我们修改了变异器,将其第一个自留层替换为NPLM的本地配置层,这导致三个字级语言建模数据集的微但一致的易解度下降。

0

相关内容

语言模型化

语言模型化

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

专知会员服务

5+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization

Arxiv

0+阅读 · 2021年5月31日

Probabilistic Deep Learning with Probabilistic Neural Networks and Deep Probabilistic Models

Arxiv

0+阅读 · 2021年5月31日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Arxiv

4+阅读 · 2018年1月30日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

语言模型化

神经概率语言模型

相关VIP内容

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

专知会员服务

5+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization

Arxiv

0+阅读 · 2021年5月31日

Probabilistic Deep Learning with Probabilistic Neural Networks and Deep Probabilistic Models

Arxiv

0+阅读 · 2021年5月31日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Arxiv

4+阅读 · 2018年1月30日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员