风险反离线强化学习 (Risk-Averse Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · Performer · 优化器 · 强化学习 · Performance ·

2021 年 2 月 10 日

Risk-Averse Offline Reinforcement Learning

翻译：风险反离线强化学习

Núria Armengol Urpí,Sebastian Curi,Andreas Krause

Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration. Thus, the agent can only use data previously collected by safe policies. While previous work considers optimizing the average performance using offline data, we focus on optimizing a risk-averse criteria, namely the CVaR. In particular, we present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting. We show that O-RAAC learns policies with higher CVaR than risk-neutral approaches in different robot control tasks. Furthermore, considering risk-averse criteria guarantees distributional robustness of the average performance with respect to particular distribution shifts. We demonstrate empirically that in the presence of natural distribution-shifts, O-RAAC learns policies with good average performance.

翻译：由于与勘探有关的风险,高占用应用中的培训强化学习代理可能过于令人望而却步。因此,该代理只能使用先前通过安全政策收集的数据。在以往的工作考虑利用离线数据优化平均性能的同时,我们侧重于优化风险规避标准,即CVaR。特别是,我们介绍了离线风险规避行为-Critict(O-RAAC),这是一种无模型的RL算法,能够在完全离线的环境中学习反风险政策。我们表明,在不同的机器人控制任务中,ORRAAC学习的CVaR比风险中性方法高的CVaR政策。此外,考虑到风险规避标准保证了特定分销转移的平均性能的分布性强性。我们从经验上表明,在自然分配变换时,O-RAAC学习的政策具有良好的平均性能。

0

相关内容

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

289+阅读 · 2020年3月10日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems

Arxiv

0+阅读 · 2021年4月4日

Asymptotics of Reinforcement Learning with Neural Networks

Arxiv

0+阅读 · 2021年4月2日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Using Ternary Rewards to Reason over Knowledge Graphs with Deep Reinforcement Learning

Arxiv

3+阅读 · 2019年2月26日

Few-shot Learning with Meta Metric Learners

Arxiv

13+阅读 · 2019年1月26日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

289+阅读 · 2020年3月10日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems

Arxiv

0+阅读 · 2021年4月4日

Asymptotics of Reinforcement Learning with Neural Networks

Arxiv

0+阅读 · 2021年4月2日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Using Ternary Rewards to Reason over Knowledge Graphs with Deep Reinforcement Learning

Arxiv

3+阅读 · 2019年2月26日

Few-shot Learning with Meta Metric Learners

Arxiv

13+阅读 · 2019年1月26日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

微信扫码咨询专知VIP会员