调整基于信息的价值强化学习的探索率 (Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · Agent · 优化器 · Performer · SimPLe ·

2022 年 12 月 31 日

Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning

翻译：调整基于信息的价值强化学习的探索率

Isaac J. Sledge,Jose C. Principe

from arxiv, Submitted to the IEEE Transactions on Information Theory

In this paper, we consider the problem of adjusting the exploration rate when using value-of-information-based exploration. We do this by converting the value-of-information optimization into a problem of finding equilibria of a flow for a changing exploration rate. We then develop an efficient path-following scheme for converging to these equilibria and hence uncovering optimal action-selection policies. Under this scheme, the exploration rate is automatically adapted according to the agent's experiences. Global convergence is theoretically assured. We first evaluate our exploration-rate adaptation on the Nintendo GameBoy games Centipede and Millipede. We demonstrate aspects of the search process, like that it yields a hierarchy of state abstractions. We also show that our approach returns better policies in fewer episodes than conventional search strategies relying on heuristic, annealing-based exploration-rate adjustments. We then illustrate that these trends hold for deep, value-of-information-based agents that learn to play ten simple games and over forty more complicated games for the Nintendo GameBoy system. Performance either near or well above the level of human play is observed.

翻译：在本文中,我们考虑在使用基于信息的价值勘探时调整勘探率的问题。我们这样做的方法是,将信息价值优化转化为为变化的勘探率寻找流动平衡的问题。然后,我们制定高效的跟踪路径计划,以融合这些平衡,从而发现最佳的行动选择政策。在这个计划下,勘探率根据代理人的经验自动调整。从理论上讲,全球趋同是有保障的。我们首先评估我们对Nintendo GameBoy游戏的探索率调整。我们展示了搜索过程的方方面面,例如它产生一种状态抽象的等级。我们还表明,我们的方法比依靠超自然的、基于内线的勘探率调整的传统搜索战略少,还产生更好的政策。然后我们说明,这些趋势对学会玩10场简单游戏和超过40场更复杂的游戏的深层、有价值的、基于信息的代理人,这些代理人学会玩10场游戏,以及Nintendo GameBoy系统的游戏。我们观察到了接近或远高于人类游戏水平的业绩。

0

相关内容

Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

补肾活血汤干预瘦素调控骨关节炎软骨、滑膜及脂肪细胞交互作用的网络机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于磁层卫星和地面观测与太阳日冕遥测的磁场重联研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

COPD的慢性炎症微环境对肿瘤自身种植所致肺癌复发进展的影响

国家自然科学基金

0+阅读 · 2011年12月31日

六溴环十二烷及其复合污染物脑发育期暴露的甲状腺激素干扰与神经毒性效应

国家自然科学基金

0+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Arxiv

0+阅读 · 2023年2月28日

The In-Sample Softmax for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

Arxiv

0+阅读 · 2023年2月27日

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

Uniformly Conservative Exploration in Reinforcement Learning

Uniformly Conservative Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月24日

schlably: A Python Framework for Deep Reinforcement Learning Based Scheduling Experiments

Arxiv

0+阅读 · 2023年2月24日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Arxiv

0+阅读 · 2023年2月28日

The In-Sample Softmax for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

Arxiv

0+阅读 · 2023年2月27日

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

Uniformly Conservative Exploration in Reinforcement Learning

Uniformly Conservative Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月24日

schlably: A Python Framework for Deep Reinforcement Learning Based Scheduling Experiments

Arxiv

0+阅读 · 2023年2月24日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

相关基金

补肾活血汤干预瘦素调控骨关节炎软骨、滑膜及脂肪细胞交互作用的网络机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于磁层卫星和地面观测与太阳日冕遥测的磁场重联研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

COPD的慢性炎症微环境对肿瘤自身种植所致肺癌复发进展的影响

国家自然科学基金

0+阅读 · 2011年12月31日

六溴环十二烷及其复合污染物脑发育期暴露的甲状腺激素干扰与神经毒性效应

国家自然科学基金

0+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员