解释受约束的MARL的原始-多元计算法 (Interpreting Primal-Dual Algorithms for Constrained MARL) - 专知论文

会员服务 ·

0

罚项 · 泛函 · 估计/估计量 · SimPLe · 约束 ·

2022 年 12 月 1 日

Interpreting Primal-Dual Algorithms for Constrained MARL

翻译：解释受约束的MARL的原始-多元计算法

Daniel Tabas,Ahmed S. Zamzam,Baosen Zhang

from arxiv, 19 pages, 8 figures. Submitted to L4DC 2023

Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First, we show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. However, by making simple modifications to the penalty term, we can enforce meaningful probabilistic (chance and conditional value at risk) constraints. Second, we quantify the effect of the penalty term on the value function, uncovering an improved value estimation procedure. We use these insights to propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulations in a simple constrained multiagent environment affirm that our reinterpretation of the primal-dual method in terms of probabilistic constraints is effective, and that our proposed value estimate accelerates convergence to a safe joint policy.

翻译：随着MARL算法在从能源系统到无人机群等现实世界系统中找到新的应用,大多数C-MARL算法都采用原始的双重方法,通过附加奖励的处罚功能来强制实施限制。在本文中,我们研究了这一惩罚术语对MARL问题的结构性影响。首先,我们表明,使用约束功能作为惩罚的标准做法导致一种薄弱的安全概念。然而,通过简单修改惩罚术语,我们可以实施有意义的概率(风险中选择和有条件价值)限制。第二,我们量化惩罚术语对价值功能的影响,发现一个更好的价值估计程序。我们利用这些洞察力提出一个受限制的多剂优势行为者评论家(C-MAA2C)算法。在一个简单受限制的多剂环境中的模拟证实,我们重新解释在概率制约方面原始方法是有效的,我们提出的价值估计加快了与安全联合政策的趋同。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Cidea和Fsp27蛋白调控机体脂代谢的功能研究

国家自然科学基金

0+阅读 · 2017年12月31日

Beclin 1-VPS34复合体对神经细胞内β淀粉样蛋白稳态的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA NeST调控非节段型白癜风CD8+CTLs细胞IFN-γ通路活化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属对硝化抑制剂DMPP土壤行为和微生物效应的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Transformed Primal-Dual Methods For Nonlinear Saddle Point Systems

Arxiv

0+阅读 · 2023年2月1日

Improved Exact and Heuristic Algorithms for Maximum Weight Clique

Arxiv

0+阅读 · 2023年2月1日

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

Arxiv

0+阅读 · 2023年1月31日

Normalized Weighting Schemes for Image Interpolation Algorithms

Arxiv

0+阅读 · 2023年1月31日

Exact and Heuristic Approaches to Speeding Up the MSM Time Series Distance Computation

Arxiv

0+阅读 · 2023年1月31日

Sampling numbers of smoothness classes via $\ell^1$-minimization

Arxiv

0+阅读 · 2023年1月31日

A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements

Arxiv

0+阅读 · 2023年1月31日

Improved quantum algorithms for linear and nonlinear differential equations

Arxiv

0+阅读 · 2023年1月30日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Transformed Primal-Dual Methods For Nonlinear Saddle Point Systems

Arxiv

0+阅读 · 2023年2月1日

Improved Exact and Heuristic Algorithms for Maximum Weight Clique

Arxiv

0+阅读 · 2023年2月1日

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

Arxiv

0+阅读 · 2023年1月31日

Normalized Weighting Schemes for Image Interpolation Algorithms

Arxiv

0+阅读 · 2023年1月31日

Exact and Heuristic Approaches to Speeding Up the MSM Time Series Distance Computation

Arxiv

0+阅读 · 2023年1月31日

Sampling numbers of smoothness classes via $\ell^1$-minimization

Arxiv

0+阅读 · 2023年1月31日

A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements

Arxiv

0+阅读 · 2023年1月31日

Improved quantum algorithms for linear and nonlinear differential equations

Arxiv

0+阅读 · 2023年1月30日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

相关基金

Cidea和Fsp27蛋白调控机体脂代谢的功能研究

国家自然科学基金

0+阅读 · 2017年12月31日

Beclin 1-VPS34复合体对神经细胞内β淀粉样蛋白稳态的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA NeST调控非节段型白癜风CD8+CTLs细胞IFN-γ通路活化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属对硝化抑制剂DMPP土壤行为和微生物效应的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

富含半胱氨酸的酸性分泌蛋白SPARC在胃癌细胞中的表达和调控

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员