Monte Carlo勘探-开发取缓冲抽样交易 (MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling) - 专知论文

会员服务 ·

0

样本 · Buffer（公司） · state-of-the-art · 蒙特卡罗 · Learning ·

2022 年 10 月 24 日

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

翻译：Monte Carlo勘探-开发取缓冲抽样交易

Julius Ott,Lorenzo Servadei,Jose Arjona-Medina,Enrico Rinaldi,Gianfranco Mauro,Daniela Sánchez Lopera,Michael Stephan,Thomas Stadelmayer,Avik Santra,Robert Wille

from arxiv, Submitted at ICASSP 2023

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.

翻译：数据选择对于任何基于数据的优化技术都至关重要,例如“强化学习”等。经验回放缓冲最先进的抽样战略改善了“强化学习”工具的性能。但是,它们并没有将不确定性纳入“Q-Value”估计中。因此,它们无法根据任务的复杂性调整取样战略,包括探索和利用过渡,以适应任务的复杂性。为解决这一问题,本文件提出了利用勘探-开发权衡的新抽样战略。这得益于对“Q-Value”功能的不确定性估计。Q-Value功能指导取样探索更重大的过渡,从而学习更有效的政策。对古典控制环境的实验表明,在不同环境中取得稳定的结果。它们表明,拟议的方法优于最先进的采样战略,以获得密集的回报(r.t.). 趋同和高峰性能平均达到26%。

0

相关内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

1-磷酸鞘氨醇受体信号通路在氧化应激致内皮细胞损伤中的作用与机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

mTOR信号通路对DNA双链断裂损伤修复的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高速钢轨疲劳损伤破坏机理与寿命估算

国家自然科学基金

0+阅读 · 2013年12月31日

N-乙酰葡萄糖胺增强TRAIL诱导的非小细胞肺癌凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺血性AKI中肾小管上皮细胞线粒体内膜断裂的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

CART肽对缺血性脑损伤的抗凋亡和促修复作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

智能自适应网络拥塞控制算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

Arxiv

0+阅读 · 2022年12月12日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年12月11日

Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月10日

Information-Theoretic Safe Exploration with Gaussian Processes

Information-Theoretic Safe Exploration with Gaussian Processes

Arxiv

0+阅读 · 2022年12月9日

Multi-Task Off-Policy Learning from Bandit Feedback

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Multi-Task Option Learning and Discovery for Stochastic Path Planning

Arxiv

0+阅读 · 2022年12月8日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年12月7日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

VIP会员

文章信息

相关主题

Buffer（公司）

state-of-the-art

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

网络安全技术生成式人工智能服务安全基本要求

【博士论文】面向下游任务的语言模型优化：一种后训练视角

【新书】AI红队演练：智能系统的攻击与防御

基于 Transformer 的脑电解码综述询问 ChatGPT

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

Arxiv

0+阅读 · 2022年12月12日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年12月11日

Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月10日

Information-Theoretic Safe Exploration with Gaussian Processes

Information-Theoretic Safe Exploration with Gaussian Processes

Arxiv

0+阅读 · 2022年12月9日

Multi-Task Off-Policy Learning from Bandit Feedback

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Multi-Task Option Learning and Discovery for Stochastic Path Planning

Arxiv

0+阅读 · 2022年12月8日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年12月7日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

相关基金

1-磷酸鞘氨醇受体信号通路在氧化应激致内皮细胞损伤中的作用与机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

mTOR信号通路对DNA双链断裂损伤修复的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高速钢轨疲劳损伤破坏机理与寿命估算

国家自然科学基金

0+阅读 · 2013年12月31日

N-乙酰葡萄糖胺增强TRAIL诱导的非小细胞肺癌凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺血性AKI中肾小管上皮细胞线粒体内膜断裂的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

CART肽对缺血性脑损伤的抗凋亡和促修复作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

智能自适应网络拥塞控制算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员