近邻语言模式为何有效? (Why do Nearest Neighbor Language Models Work?) - 专知论文

会员服务 ·

0

语言模型化 · Performer · MoDELS · KNN · 近邻 ·

2023 年 1 月 17 日

Why do Nearest Neighbor Language Models Work?

翻译：近邻语言模式为何有效?

Frank F. Xu,Uri Alon,Graham Neubig

from arxiv, Preprint, 21 pages

Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform a careful analysis of the various dimensions over which kNN-LM diverges from standard LMs, and investigate these dimensions one by one. Empirically, we identify three main reasons why kNN-LM performs better than standard LMs: using a different input representation for predicting the next tokens, approximate kNN search, and the importance of softmax temperature for the kNN distribution. Further, we incorporate these insights into the model architecture or the training procedure of the standard parametric LM, improving its results without the need for an explicit retrieval component. The code is available at https://github.com/frankxu2004/knnlm-why.

翻译：语言模型( LMS) 测算文本的概率。语言模型( LMS) 通过按顺序计算一个已见背景的表示值, 并使用这个表示值来预测下一个字词。目前, 大多数 LMS通过一个消耗前一字的神经网络计算这些表示值。然而最近, 检索增强 LMS 显示, 通过访问从一个大型数据存储处检索的信息, 以及它们的标准参数、参数、下一个词的预测, 从而改善了标准神经LM 。在本文中, 我们开始理解为什么检索 - 强化语言模型, 特别是 k- 近邻语言模型( kNN- LMs) 比标准的参数 LMS( kNN- LMs) 表现得更好, 即使 k- 最近的邻居组件从最初培训的同一训练中提取了示例。然而, 我们仔细分析了 kNNM- LM 与标准值的不同维度, 并且对这些维度进行了一次调查。 Empircly, 我们确定了 kN- LMM 执行比标准的LMSMSMs 更好的三个主要原因: 使用一种不同的温度, 使用不同的输入来预测 KNNNMs 标准结构的模型, 。。大约 KNNNSimprealalal 搜索和这些模型的模型的模型, 。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

玉米转脂蛋白新成员ZmLTP3的抗盐功能及其上游调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

新生儿缺血缺氧性脑病中TRPC3通道经SOC介导的钙内流调控内质网应激的研究

国家自然科学基金

0+阅读 · 2013年12月31日

NET基因启动子区DNA甲基化及组蛋白修饰在抑郁症与高血压相关性中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥C3H14和C3H15基因调控次生细胞壁形成的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多酸基生物质碳纳米管固体催化剂的制备及催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

系统学指导的放线菌万古霉素类化合物的快速筛选与发现

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

Model-based Causal Bayesian Optimization

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Cones: Concept Neurons in Diffusion Models for Customized Generation

Arxiv

0+阅读 · 2023年3月9日

Larger language models do in-context learning differently

Arxiv

0+阅读 · 2023年3月8日

Automatically Auditing Large Language Models via Discrete Optimization

Arxiv

0+阅读 · 2023年3月8日

Explainable Deep Learning: A Field Guide for the Uninitiated

Arxiv

51+阅读 · 2021年9月13日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Model-based Causal Bayesian Optimization

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Cones: Concept Neurons in Diffusion Models for Customized Generation

Arxiv

0+阅读 · 2023年3月9日

Larger language models do in-context learning differently

Arxiv

0+阅读 · 2023年3月8日

Automatically Auditing Large Language Models via Discrete Optimization

Arxiv

0+阅读 · 2023年3月8日

Explainable Deep Learning: A Field Guide for the Uninitiated

Arxiv

51+阅读 · 2021年9月13日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

玉米转脂蛋白新成员ZmLTP3的抗盐功能及其上游调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

新生儿缺血缺氧性脑病中TRPC3通道经SOC介导的钙内流调控内质网应激的研究

国家自然科学基金

0+阅读 · 2013年12月31日

NET基因启动子区DNA甲基化及组蛋白修饰在抑郁症与高血压相关性中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥C3H14和C3H15基因调控次生细胞壁形成的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多酸基生物质碳纳米管固体催化剂的制备及催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

系统学指导的放线菌万古霉素类化合物的快速筛选与发现

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员