MEET: 一种基于蒙特卡罗探索-开发权衡的缓冲区抽样方法 (MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling) - 专知论文

会员服务 ·

0

蒙特卡罗 · 经验回放 · 不确定 · 不确定性 · 数据选择 ·

2023 年 4 月 17 日

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

翻译：MEET: 一种基于蒙特卡罗探索-开发权衡的缓冲区抽样方法

Julius Ott,Lorenzo Servadei,Jose Arjona-Medina,Enrico Rinaldi,Gianfranco Mauro,Daniela Sánchez Lopera,Michael Stephan,Thomas Stadelmayer,Avik Santra,Robert Wille

from arxiv, Accepted at ICASSP 2023

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.

翻译：数据选择对于任何基于数据优化技术（如强化学习）至关重要。经验回放缓冲区的最新抽样策略可以提高强化学习代理的性能。然而，它们没有考虑到 Q 值估计的不确定性。因此，它们不能将探索和开发转换策略适应于任务的复杂性。为了解决这个问题，本文提出了一种新的抽样策略，利用了探索-开发权衡。这是通过 Q 值函数的不确定性估计实现的，指导抽样以探索更重要的转换，从而学习更有效的策略。在经典控制环境的实验中，证明了在各种环境中都具有稳定的结果。实验结果表明，在稠密奖励方面，该方法的收敛和峰值性能平均优于最先进的抽样策略 26%。

0

相关内容

蒙特卡罗

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

信息产品与附加服务的最优定价策略研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于熵优化原理的大偏差风险分析与应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于软测量的纺织工业生产过程鲁棒运行优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

空间极值模型的贝叶斯推断及其在气候变化政策中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于类别非平衡时序增量数据批的多SVM动态集成企业信用评估建模

国家自然科学基金

1+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Monte-Carlo simulation method for the frequency comb spectrum of an atom laser

Monte-Carlo simulation method for the frequency comb spectrum of an atom laser

Arxiv

0+阅读 · 2023年6月3日

GateON: an unsupervised method for large scale continual learning

Arxiv

0+阅读 · 2023年6月2日

Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes

Arxiv

0+阅读 · 2023年6月2日

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Arxiv

1+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Generalization for slowly mixing processes

Arxiv

0+阅读 · 2023年6月1日

Normalization Enhances Generalization in Visual Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Monte-Carlo simulation method for the frequency comb spectrum of an atom laser

Monte-Carlo simulation method for the frequency comb spectrum of an atom laser

Arxiv

0+阅读 · 2023年6月3日

GateON: an unsupervised method for large scale continual learning

Arxiv

0+阅读 · 2023年6月2日

Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes

Arxiv

0+阅读 · 2023年6月2日

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Arxiv

1+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Generalization for slowly mixing processes

Arxiv

0+阅读 · 2023年6月1日

Normalization Enhances Generalization in Visual Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

信息产品与附加服务的最优定价策略研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于熵优化原理的大偏差风险分析与应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于软测量的纺织工业生产过程鲁棒运行优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

空间极值模型的贝叶斯推断及其在气候变化政策中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于类别非平衡时序增量数据批的多SVM动态集成企业信用评估建模

国家自然科学基金

1+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员