平均受约束政策优化 (Average-Constrained Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · Performer · 准则 · 情景 · Extensibility ·

2023 年 2 月 2 日

Average-Constrained Policy Optimization

翻译：平均受约束政策优化

Akhil Agnihotri,Rahul Jain,Haipeng Luo

from arxiv, 18 pages

Reinforcement Learning (RL) with constraints is becoming an increasingly important problem for various applications. Often, the average criterion is more suitable. Yet, RL for average criterion-constrained MDPs remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new (possibly the first) policy optimization algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by the famed PPO-type algorithms based on trust region methods. We develop basic sensitivity theory for average MDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging MuJoCo environments, show the superior performance of the algorithm when compared to other state-of-the-art algorithms adapted for the average CMDP setting.

翻译：受制约的强化学习(RL)正在成为各种应用中越来越重要的问题。通常,平均标准更适合。然而,平均标准限制的 MDP 的RL 仍是一个具有挑战性的问题。为折扣的受限制RL问题设计的分类对于平均 CMDP 环境来说往往效果不佳。在本文中,我们为有平均标准的限制 MDP 引入了新的(可能是第一种)政策优化算法。平均限制的政策优化算法(ACPO)受以信任区域方法为基础的著名的PPPO型算法的启发。我们为平均 MDP 开发基本敏感度理论,然后在算法的设计中使用相应的界限。我们从理论上保证其性能,并通过在具有挑战性的各种MuJoCo环境中开展广泛的实验工作,显示与适应平均 CMDP 环境的其他最先进的算法相比,算法表现优。

0

相关内容

优化器

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

皮肤间充质干细胞通过其分泌的sTNFR1抑制Th17细胞的分化

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

面向移动IPTV直播频道可用度的优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞非线性系统随机动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

粒子群优化算法的收敛机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

Regularization of polynomial networks for image recognition

Arxiv

0+阅读 · 2023年3月24日

Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

Arxiv

0+阅读 · 2023年3月23日

Preference-Aware Constrained Multi-Objective Bayesian Optimization

Arxiv

0+阅读 · 2023年3月23日

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年3月22日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Arxiv

0+阅读 · 2023年3月22日

Exponential Consistency of M-estimators in Generalized Linear Mixed Models

Arxiv

0+阅读 · 2023年3月22日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

相关论文

Regularization of polynomial networks for image recognition

Arxiv

0+阅读 · 2023年3月24日

Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

Arxiv

0+阅读 · 2023年3月23日

Preference-Aware Constrained Multi-Objective Bayesian Optimization

Arxiv

0+阅读 · 2023年3月23日

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年3月22日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Arxiv

0+阅读 · 2023年3月22日

Exponential Consistency of M-estimators in Generalized Linear Mixed Models

Arxiv

0+阅读 · 2023年3月22日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

相关基金

皮肤间充质干细胞通过其分泌的sTNFR1抑制Th17细胞的分化

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

面向移动IPTV直播频道可用度的优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞非线性系统随机动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

粒子群优化算法的收敛机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员