滞后政策强化学习 (Nested Policy Reinforcement Learning) - 专知论文

会员服务 ·

0

回合 · Performer · 学成 · 强化学习 · GROUP ·

2021 年 10 月 6 日

Nested Policy Reinforcement Learning

翻译：滞后政策强化学习

Aishwarya Mandyam,Andrew Jones,Krzysztof Laudanski,Barbara Engelhardt

Off-policy reinforcement learning (RL) has proven to be a powerful framework for guiding agents' actions in environments with stochastic rewards and unknown or noisy state dynamics. In many real-world settings, these agents must operate in multiple environments, each with slightly different dynamics. For example, we may be interested in developing policies to guide medical treatment for patients with and without a given disease, or policies to navigate curriculum design for students with and without a learning disability. Here, we introduce nested policy fitted Q-iteration (NFQI), an RL framework that finds optimal policies in environments that exhibit such a structure. Our approach develops a nested $Q$-value function that takes advantage of the shared structure between two groups of observations from two separate environments while allowing their policies to be distinct from one another. We find that NFQI yields policies that rely on relevant features and perform at least as well as a policy that does not consider group structure. We demonstrate NFQI's performance using an OpenAI Gym environment and a clinical decision making RL task. Our results suggest that NFQI can develop policies that are better suited to many real-world clinical environments.

翻译：强化政策外学习(RL)已被证明是一个强有力的框架,用以指导代理人在具有随机性奖赏和未知或噪音状态动态的环境中采取行动。在许多现实世界环境中,这些代理人必须在多种环境中运作,每个环境的动态略有不同。例如,我们可能有兴趣制定政策,指导对有特定疾病和没有特定疾病的病人的治疗,或制定为有学习残疾和没有学习残疾的学生制定课程设计的政策。在这里,我们引入了符合Qitenation(NFQI)的嵌套政策(NFQI),这是一个在展示这种结构的环境中找到最佳政策的RL框架。我们的方法开发了一个嵌套的美元价值功能,利用两个不同环境的两组观察组之间的共享结构,同时允许其政策与另一个不同。我们发现NFQI产生的政策依赖于相关特点,至少不考虑群体结构的政策。我们用OpenAI Gym环境和临床决策任务来展示NFQI的表现。我们的结果表明,NFQI可以制定更适合许多现实临床环境的政策。

0

相关内容

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

289+阅读 · 2020年3月10日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

Relational Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年6月5日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

289+阅读 · 2020年3月10日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

Relational Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年6月5日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

微信扫码咨询专知VIP会员