平衡制约和奖励与Met-Graient D4PG的平衡 (Balancing Constraints and Rewards with Meta-Gradient D4PG) - 专知论文

会员服务 ·

0

约束 · 期望回报 · 阈值 · CASES · 基准 ·

2020 年 10 月 13 日

Balancing Constraints and Rewards with Meta-Gradient D4PG

翻译：平衡制约和奖励与Met-Graient D4PG的平衡

Dan A. Calian,Daniel J. Mankowitz,Tom Zahavy,Zhongwen Xu,Junhyuk Oh,Nir Levine,Timothy Mann

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present two soft-constrained RL approaches that utilize meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of these approaches by showing that they consistently outperform the baselines across four different Mujoco domains.

翻译：部署强化学习(RL)代理商以解决现实世界应用往往需要满足复杂的系统限制。由于系统的复杂性或无法核实离线阈值(例如,不存在模拟器或合理的离线评估程序),往往错误地设定了限制阈值。这导致在不违反限制的情况下无法解决问题的解决办法。然而,在许多现实世界中,限制违规现象是不可取的,但却不是灾难性的,促使需要采用软约束的RL方法。我们提出了两种软约束的RL方法,利用元分法在预期返回和尽量减少限制违规之间找到一个良好的平衡点。我们通过表明这些方法始终超越四个不同的Mujoco域的基线来证明这些方法的有效性。

0

相关内容

【Cell 2020】神经网络中的持续学习

【Cell 2020】神经网络中的持续学习

专知会员服务

62+阅读 · 2020年11月7日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【MLSS2020】流数据贝叶斯预测，米兰Sonia Petrone教授，80页ppt

【MLSS2020】流数据贝叶斯预测，米兰Sonia Petrone教授，80页ppt

专知会员服务

48+阅读 · 2020年7月5日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【阿姆斯特丹大学深度学习课程】《UvA Deep Learning Course》，阿姆斯特丹大学助理教授| Efstratios Gavves

【阿姆斯特丹大学深度学习课程】《UvA Deep Learning Course》，阿姆斯特丹大学助理教授| Efstratios Gavves

专知会员服务

20+阅读 · 2020年1月23日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

已删除

创业邦杂志

5+阅读 · 2019年3月27日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Continual learning with direction-constrained optimization

Arxiv

0+阅读 · 2020年11月25日

Inverse Constrained Reinforcement Learning

Arxiv

0+阅读 · 2020年11月24日

Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints

Arxiv

0+阅读 · 2020年11月23日

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Arxiv

0+阅读 · 2020年11月22日

First Steps: Latent-Space Control with Semantic Constraints for Quadruped Locomotion

Arxiv

0+阅读 · 2020年11月20日

Optimization under rare chance constraints

Arxiv

0+阅读 · 2020年11月18日

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

Arxiv

0+阅读 · 2020年11月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Arxiv

4+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

相关VIP内容

【Cell 2020】神经网络中的持续学习

【Cell 2020】神经网络中的持续学习

专知会员服务

62+阅读 · 2020年11月7日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【MLSS2020】流数据贝叶斯预测，米兰Sonia Petrone教授，80页ppt

【MLSS2020】流数据贝叶斯预测，米兰Sonia Petrone教授，80页ppt

专知会员服务

48+阅读 · 2020年7月5日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【阿姆斯特丹大学深度学习课程】《UvA Deep Learning Course》，阿姆斯特丹大学助理教授| Efstratios Gavves

【阿姆斯特丹大学深度学习课程】《UvA Deep Learning Course》，阿姆斯特丹大学助理教授| Efstratios Gavves

专知会员服务

20+阅读 · 2020年1月23日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

已删除

创业邦杂志

5+阅读 · 2019年3月27日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Continual learning with direction-constrained optimization

Arxiv

0+阅读 · 2020年11月25日

Inverse Constrained Reinforcement Learning

Arxiv

0+阅读 · 2020年11月24日

Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints

Arxiv

0+阅读 · 2020年11月23日

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Arxiv

0+阅读 · 2020年11月22日

First Steps: Latent-Space Control with Semantic Constraints for Quadruped Locomotion

Arxiv

0+阅读 · 2020年11月20日

Optimization under rare chance constraints

Arxiv

0+阅读 · 2020年11月18日

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

Arxiv

0+阅读 · 2020年11月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Arxiv

4+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员