高效近邻语言模型 (Efficient Nearest Neighbor Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 近邻 · 神经语言模型 · 推断 ·

2021 年 9 月 9 日

Efficient Nearest Neighbor Language Models

翻译：高效近邻语言模型

Junxian He,Graham Neubig,Taylor Berg-Kirkpatrick

from arxiv, EMNLP 2021

Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often require retrieval from a large datastore at test time, significantly increasing the inference overhead and thus limiting the deployment of non-parametric NLMs in practical applications. In this paper, we take the recently proposed $k$-nearest neighbors language model (Khandelwal et al., 2019) as an example, exploring methods to improve its efficiency along various dimensions. Experiments on the standard WikiText-103 benchmark and domain-adaptation datasets show that our methods are able to achieve up to a 6x speed-up in inference speed while retaining comparable performance. The empirical analysis we present may provide guidelines for future research seeking to develop or deploy more efficient non-parametric NLMs.

翻译：非对称神经语言模型(NLMs)学习利用外部数据存储处的文字预测分布,从而通过明确记忆化培训数据点来学习,这些模型虽然有效,但往往需要从试验时的大数据储存处检索,从而大大增加了间接推断,从而限制了在实际应用中部署非对称NLM。在本文中,我们以最近提出的美元最近邻语言模型(Khandelwal等人,2019年)为例,探讨在各个方面提高效率的方法。关于标准Wikit-103基准和域适应数据集的实验表明,我们的方法能够在保持可比性能的同时达到最高6x加速的推断速度。我们介绍的经验分析可以为今后寻求开发或部署更有效非参数的NLMs的研究提供指导方针。

0

相关内容

语言模型化

语言模型化

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【ICLR2021】常识人工智能，77页ppt

【ICLR2021】常识人工智能，77页ppt

专知会员服务

78+阅读 · 2021年5月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

少即是多？非参数语言模型，68页ppt

少即是多？非参数语言模型，68页ppt

专知会员服务

24+阅读 · 2020年11月22日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

102+阅读 · 2020年3月9日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

专知会员服务

59+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

AINLP

8+阅读 · 2019年6月11日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Optimal prediction for kernel-based semi-functional linear regression

Arxiv

0+阅读 · 2021年10月29日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

语言模型化

神经语言模型

相关VIP内容

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【ICLR2021】常识人工智能，77页ppt

【ICLR2021】常识人工智能，77页ppt

专知会员服务

78+阅读 · 2021年5月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

少即是多？非参数语言模型，68页ppt

少即是多？非参数语言模型，68页ppt

专知会员服务

24+阅读 · 2020年11月22日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

102+阅读 · 2020年3月9日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

专知会员服务

59+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

AINLP

8+阅读 · 2019年6月11日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Optimal prediction for kernel-based semi-functional linear regression

Arxiv

0+阅读 · 2021年10月29日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员