分辨密密钥- value 瓶点 (Discrete Key-Value Bottleneck) - 专知论文

会员服务 ·

0

离散化 · 可约的 · MoDELS · Learning · 流 ·

2023 年 2 月 15 日

Discrete Key-Value Bottleneck

翻译：分辨密密钥- value 瓶点

Frederik Träuble,Anirudh Goyal,Nasim Rahaman,Michael Mozer,Kenji Kawaguchi,Yoshua Bengio,Bernhard Schölkopf

Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a sparse number of these key-value pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the discrete key-value bottleneck to minimize the effect of learning under distribution shifts and show that it reduces the complexity of the hypothesis class. We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model - without any task boundaries - reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.

翻译：深度神经网络在分类任务方面表现良好, 数据流是 i.d.d. 和标签数据是丰富的。在非固定培训数据流( 如持续学习) 中, 挑战出现。应对这一挑战的一个强有力方法是, 在随时可获得的数据数量上对大型编码器进行预先培训, 并随后对任务进行具体调整。但是,由于一项新任务, 更新这些编码器的重量具有挑战性, 因为大量重量需要微调, 因此它们会忘记关于先前任务的信息。在目前的工作中, 我们提议一个模型来解决这个问题, 建在包含独立和可学习的关键值代码的离散的瓶盖上。我们的范例将是编码; 通过离散的瓶码处理代表, 并随后进行特定任务的调试。这里, 输入这些编码器的重量是用来选择最接近的键的键, 并且相应的值被反馈给解析器, 以解析当前任务。模型只能获取和重新使用一个稀疏的模型, 包含单独和可学习的关键值关键值关键值的边框, 将显示我们学习能力变化的底值变化的模型, 以本地化, 显示, 学习方法的缩缩校底值的校底值显示, 我们的校底值的校底值, 显示, 学习方法的校底值的校底值的校底值, 。

0

相关内容

离散化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

富铜的铜铟镓硒电池吸收层的研究

国家自然科学基金

0+阅读 · 2015年12月31日

低复杂度FIR滤波器设计理论与方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于三电平拓扑的中压大功率永磁同步电机牵引系统关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

模态信息非完备采样下被动声纳目标检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化石墨烯/光敏剂复合纳米材料的自组装、高效光动力治疗及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

卷曲法构筑高性能单根硅微米管锂电池器件及其SEI膜的表征研究

国家自然科学基金

0+阅读 · 2013年12月31日

控制方向未知的随机非线性系统的神经网络自适应控制

国家自然科学基金

2+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

与玻色-爱因斯坦凝聚相关的确定与不确定系统孤立子的动力学行为

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

Arxiv

0+阅读 · 2023年4月7日

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Arxiv

0+阅读 · 2023年4月6日

Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems

Arxiv

0+阅读 · 2023年4月6日

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Arxiv

0+阅读 · 2023年4月5日

On the Concentration of the Minimizers of Empirical Risks

Arxiv

0+阅读 · 2023年4月3日

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

Arxiv

0+阅读 · 2023年4月3日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

相关论文

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

Arxiv

0+阅读 · 2023年4月7日

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Arxiv

0+阅读 · 2023年4月6日

Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems

Arxiv

0+阅读 · 2023年4月6日

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Arxiv

0+阅读 · 2023年4月5日

On the Concentration of the Minimizers of Empirical Risks

Arxiv

0+阅读 · 2023年4月3日

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

Arxiv

0+阅读 · 2023年4月3日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

富铜的铜铟镓硒电池吸收层的研究

国家自然科学基金

0+阅读 · 2015年12月31日

低复杂度FIR滤波器设计理论与方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于三电平拓扑的中压大功率永磁同步电机牵引系统关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

模态信息非完备采样下被动声纳目标检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化石墨烯/光敏剂复合纳米材料的自组装、高效光动力治疗及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

卷曲法构筑高性能单根硅微米管锂电池器件及其SEI膜的表征研究

国家自然科学基金

0+阅读 · 2013年12月31日

控制方向未知的随机非线性系统的神经网络自适应控制

国家自然科学基金

2+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

与玻色-爱因斯坦凝聚相关的确定与不确定系统孤立子的动力学行为

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员