内存有限内存有限内里里里里拉强盗 (Online Limited Memory Neural-Linear Bandits with Likelihood Matching) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 似然 · 估计/估计量 · 上下文赌博机/上下文老虎机 · Performer ·

2021 年 6 月 8 日

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

翻译：内存有限内存有限内里里里里拉强盗

Ofir Nabati,Tom Zahavy,Shie Mannor

from arxiv, ICML 2021. arXiv admin note: text overlap with arXiv:1901.08612

We study neural-linear bandits for solving problems where {\em both} exploration and representation learning play an important role. Neural-linear bandits harnesses the representation power of Deep Neural Networks (DNNs) and combines it with efficient exploration mechanisms by leveraging uncertainty estimation of the model, designed for linear contextual bandits on top of the last hidden layer. In order to mitigate the problem of representation change during the process, new uncertainty estimations are computed using stored data from an unlimited buffer. Nevertheless, when the amount of stored data is limited, a phenomenon called catastrophic forgetting emerges. To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online. We applied our algorithm, Limited Memory Neural-Linear with Likelihood Matching (NeuralLinear-LiM2) on a variety of datasets and observed that our algorithm achieves comparable performance to the unlimited memory approach while exhibits resilience to catastrophic forgetting.

翻译：我们研究神经-线性土匪,以在探索和代表性学习发挥重要作用的地方解决问题。神经-线性土匪利用深神经网络(DNNs)的代表性力量,并通过利用模型的不确定性估计,将其与高效的勘探机制相结合,该模型是为最后隐藏层的线性背景土匪设计的。为了缓解在过程中代表性变化的问题,利用一个无限缓冲的存储数据来计算新的不确定性估计。然而,在储存的数据数量有限的情况下,出现了一种被称为灾难性遗忘的现象。为了减轻这种现象,我们提出了一种适应灾难性遗忘的可能性匹配算法,这种算法是完全在线的。我们在各种数据集上应用了我们的算法,即“有限记忆-线性-线性匹配”(NeuralLinear-LiM2),并观察到我们的算法取得了与无限记忆方法相似的业绩,同时展示了灾难性遗忘的恢复力。

0

相关内容

赌博机/老虎机

赌博机/老虎机

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

【电子书推荐】机器学习中的高斯过程Gaussian Processes for Machine Learning，剑桥大学 | Carl Edward Rasmussen，爱丁堡大学 | Chris Williams

【电子书推荐】机器学习中的高斯过程Gaussian Processes for Machine Learning，剑桥大学 | Carl Edward Rasmussen，爱丁堡大学 | Chris Williams

专知会员服务

97+阅读 · 2019年11月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

专知会员服务

23+阅读 · 2019年6月10日

【AAMSA 2019 | tutorial】对抗机器学习 Adversarial Machine Learning，伊利诺伊大学厄巴纳-香槟分校|Bo Li，圣路易斯华盛顿大学|Yevgeniy Vorobeychik

【AAMSA 2019 | tutorial】对抗机器学习 Adversarial Machine Learning，伊利诺伊大学厄巴纳-香槟分校|Bo Li，圣路易斯华盛顿大学|Yevgeniy Vorobeychik

专知会员服务

28+阅读 · 2019年5月13日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

4+阅读 · 2019年6月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Pure Exploration and Regret Minimization in Matching Bandits

Arxiv

0+阅读 · 2021年7月31日

Sequential Blocked Matching

Arxiv

0+阅读 · 2021年7月30日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

估计/估计量

上下文赌博机/上下文老虎机

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

【电子书推荐】机器学习中的高斯过程Gaussian Processes for Machine Learning，剑桥大学 | Carl Edward Rasmussen，爱丁堡大学 | Chris Williams

【电子书推荐】机器学习中的高斯过程Gaussian Processes for Machine Learning，剑桥大学 | Carl Edward Rasmussen，爱丁堡大学 | Chris Williams

专知会员服务

97+阅读 · 2019年11月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

专知会员服务

23+阅读 · 2019年6月10日

【AAMSA 2019 | tutorial】对抗机器学习 Adversarial Machine Learning，伊利诺伊大学厄巴纳-香槟分校|Bo Li，圣路易斯华盛顿大学|Yevgeniy Vorobeychik

【AAMSA 2019 | tutorial】对抗机器学习 Adversarial Machine Learning，伊利诺伊大学厄巴纳-香槟分校|Bo Li，圣路易斯华盛顿大学|Yevgeniy Vorobeychik

专知会员服务

28+阅读 · 2019年5月13日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

4+阅读 · 2019年6月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Pure Exploration and Regret Minimization in Matching Bandits

Arxiv

0+阅读 · 2021年7月31日

Sequential Blocked Matching

Arxiv

0+阅读 · 2021年7月30日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

微信扫码咨询专知VIP会员