DEIR: 基于辨别模型的情节内部激励的有效和鲁棒性探索 (DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards) - 专知论文

会员服务 ·

0

鲁棒 · 条件互信息 · 模型实现 · 有效性 · 基线 ·

2023 年 4 月 21 日

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

翻译：DEIR: 基于辨别模型的情节内部激励的有效和鲁棒性探索

Shanchuan Wan,Yujin Tang,Yingtao Tian,Tomoyuki Kaneko

from arxiv, Accepted as a conference paper to the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23)

Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness crucially decides the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies showed the effectiveness of encouraging exploration with intrinsic rewards estimated from novelty in observations. However, there is a gap between the novelty of an observation and an exploration in general, because the stochasticity in the environment as well as the behavior of an agent may affect the observation. To estimate exploratory behaviors accurately, we propose DEIR, a novel method where we theoretically derive an intrinsic reward from a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and materialize the reward with a discriminative forward model. We conduct extensive experiments in both standard and hardened exploration games in MiniGrid to show that DEIR quickly learns a better policy than baselines. Our evaluations in ProcGen demonstrate both generalization capabilities and the general applicability of our intrinsic reward.

翻译：探索是强化学习（RL）的基本方面，并且其有效性决定RL算法的性能，特别是面对稀疏外部奖励时。最近的研究表明，通过从观察中的新奇性估算内在奖励来鼓励探索是有效的。但是，观察的新奇性与探索之间存在差距，因为环境中的随机性以及代理的行为可能会影响观察结果。为了准确估计探索行为，我们提出了DEIR，这是一种全新的方法，我们从条件互信息项中理论上导出了一种内在奖励，该项主要随着代理探索贡献的新奇性而缩放，并通过辨别前向模型实现该奖励。我们在MiniGrid中进行了标准和耐强制性探索游戏的大量实验，以展示DEIR比基线快速学习更好的策略。在ProcGen中对其进行的评估展示了其泛化能力和内在奖励的普适性。

0

相关内容

【ICML2023】在受限逆强化学习中的可识别性和泛化能力

【ICML2023】在受限逆强化学习中的可识别性和泛化能力

专知会员服务

26+阅读 · 2023年6月5日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【ICLR2021】一种基于距离度量学习及行为正则化的完全离线的元强化学习方法

专知会员服务

17+阅读 · 2021年2月9日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【CVPR2020】我们能用强化学习来学习图模型推断的启发规则吗?

专知会员服务

43+阅读 · 2020年5月5日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

专知会员服务

32+阅读 · 2020年2月24日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于二值传感网络及隐私保护的人物室内动态定位、多行为识别与老人摔倒实时监测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

基于竞争差分析的单向交易策略

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于听觉注意机制的故障诱发信号流分流方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

基于Bregman距离的一致性风险测度及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

大规模垃圾邮件过滤中的集成化SVM增量学习机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

可学习的脉冲耦合神经网络与基于视-听觉融合的人机交互方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Revisiting Bellman Errors for Offline Model Selection

Arxiv

0+阅读 · 2023年6月6日

Toward Efficient Gradient-Based Value Estimation

Arxiv

0+阅读 · 2023年6月6日

A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年6月6日

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

ContraBAR: Contrastive Bayes-Adaptive Deep RL

Arxiv

1+阅读 · 2023年6月4日

Sampling-Based Accuracy Testing of Posterior Estimators for General Inference

Arxiv

0+阅读 · 2023年6月2日

Backchannel Detection and Agreement Estimation from Video with Transformer Networks

Arxiv

0+阅读 · 2023年6月2日

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

Arxiv

0+阅读 · 2023年6月2日

Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

Arxiv

0+阅读 · 2023年6月2日

Model Complexity of Deep Learning: A Survey

Arxiv

32+阅读 · 2021年3月8日

VIP会员

文章信息

相关主题

条件互信息

相关VIP内容

【ICML2023】在受限逆强化学习中的可识别性和泛化能力

【ICML2023】在受限逆强化学习中的可识别性和泛化能力

专知会员服务

26+阅读 · 2023年6月5日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【ICLR2021】一种基于距离度量学习及行为正则化的完全离线的元强化学习方法

专知会员服务

17+阅读 · 2021年2月9日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【CVPR2020】我们能用强化学习来学习图模型推断的启发规则吗?

专知会员服务

43+阅读 · 2020年5月5日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

【CVPR2020】CONSAC: 基于条件样本一致性的稳健多模型拟合，Robust Multi-Model Fitting by Conditional Sample Consensus

专知会员服务

32+阅读 · 2020年2月24日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Revisiting Bellman Errors for Offline Model Selection

Arxiv

0+阅读 · 2023年6月6日

Toward Efficient Gradient-Based Value Estimation

Arxiv

0+阅读 · 2023年6月6日

A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年6月6日

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

ContraBAR: Contrastive Bayes-Adaptive Deep RL

Arxiv

1+阅读 · 2023年6月4日

Sampling-Based Accuracy Testing of Posterior Estimators for General Inference

Arxiv

0+阅读 · 2023年6月2日

Backchannel Detection and Agreement Estimation from Video with Transformer Networks

Arxiv

0+阅读 · 2023年6月2日

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

Arxiv

0+阅读 · 2023年6月2日

Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

Arxiv

0+阅读 · 2023年6月2日

Model Complexity of Deep Learning: A Survey

Arxiv

32+阅读 · 2021年3月8日

相关基金

基于二值传感网络及隐私保护的人物室内动态定位、多行为识别与老人摔倒实时监测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

基于竞争差分析的单向交易策略

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于听觉注意机制的故障诱发信号流分流方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

基于Bregman距离的一致性风险测度及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

大规模垃圾邮件过滤中的集成化SVM增量学习机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

可学习的脉冲耦合神经网络与基于视-听觉融合的人机交互方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员