延迟加强任务推推网:解决时间信用分配问题 (InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem) - 专知论文

会员服务 ·

0

贡献度分配问题 · CAP · 奖励函数 · 推断 · 泛函 ·

2021 年 5 月 2 日

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

翻译：延迟加强任务推推网:解决时间信用分配问题

Markel Sanz Ausin,Hamoon Azizsoltani,Song Ju,Yeo Jin Kim,Min Chi

The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While Reinforcement Learning (RL), especially Deep RL, works well when immediate rewards are available, it can fail when only delayed rewards are available or when the reward function is noisy. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns to infer the immediate rewards from the delayed rewards. The effectiveness of InferNet was evaluated on two online RL tasks: a simple GridWorld and 40 Atari games; and two offline RL tasks: GridWorld and a real-life Sepsis treatment task. For all tasks, the effectiveness of using the InferNet inferred rewards is compared against the immediate and the delayed rewards with two settings: with noisy rewards and without noise. Overall, our results show that the effectiveness of InferNet is robust against noisy reward functions and is an effective add-on mechanism for solving temporal CAP in a wide range of RL tasks, from classic RL simulation environments to a real-world RL problem and for both online and offline learning.

翻译：长期信用分配问题(CAP)是AI中众所周知的、具有挑战性的任务。强化学习(RL),特别是深RL,在可获得即时奖励时运作良好,但当只有延迟奖励或奖励功能吵闹时,它就会失败。在这项工作中,我们建议将CAP下放给以神经网络为基础的算法InferNet,该算法明确学会从延迟的奖励中推断直接的回报。InferNet在两个在线RL任务上的有效性得到了评估:一个简单的GridWorld和40 Atari 游戏;两个离线的RL任务:GridWorld和一个真实的Sepis 治疗任务。对于所有任务,使用InferNet推导出的奖励的效力与立即和延迟的奖励相比较,有两个环境是:无声无声无声无声无声无声无息的。我们的结果显示,InferNet的效力是针对噪音奖励功能的强大力量,是解决从典型的RL模拟环境到现实世界的RL问题以及在线和离线学习等广泛任务中的时间访问的有效附加机制。

0

相关内容

贡献度分配问题

贡献度分配问题

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Off-Policy Reinforcement Learning with Delayed Rewards

Off-Policy Reinforcement Learning with Delayed Rewards

Arxiv

0+阅读 · 2021年6月22日

Emphatic Algorithms for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月21日

The Impact of Network Connectivity on Collective Learning

Arxiv

0+阅读 · 2021年6月18日

Adapting the Function Approximation Architecture in Online Reinforcement Learning

Arxiv

0+阅读 · 2021年6月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

BlockDrop: Dynamic Inference Paths in Residual Networks

Arxiv

6+阅读 · 2018年3月30日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

贡献度分配问题

相关VIP内容

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Off-Policy Reinforcement Learning with Delayed Rewards

Off-Policy Reinforcement Learning with Delayed Rewards

Arxiv

0+阅读 · 2021年6月22日

Emphatic Algorithms for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月21日

The Impact of Network Connectivity on Collective Learning

Arxiv

0+阅读 · 2021年6月18日

Adapting the Function Approximation Architecture in Online Reinforcement Learning

Arxiv

0+阅读 · 2021年6月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

BlockDrop: Dynamic Inference Paths in Residual Networks

Arxiv

6+阅读 · 2018年3月30日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员