协调权力下放多机关多权力机关运动会的可普遍学习的对风险有敏感认识的政策 (Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games) - 专知论文

会员服务 ·

0

Learning · Agent · 估计/估计量 · 泛化理论 · 知识 (knowledge) ·

2022 年 9 月 24 日

Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games

翻译：协调权力下放多机关多权力机关运动会的可普遍学习的对风险有敏感认识的政策

Ziyi Liu,Xian Guo,Yongchun Fang

While various multi-agent reinforcement learning methods have been proposed in cooperative settings, few works investigate how self-interested learning agents achieve mutual coordination in decentralized general-sum games and generalize pre-trained policies to non-cooperative opponents during execution. In this paper, we present Generalizable Risk-Sensitive Policy (GRSP). GRSP learns the distributions over agent's return and estimate a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting to training opponents, GRSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via GRSP can achieve mutual coordination during training stably and avoid being exploited by non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and firstly consider generalization during execution. Furthermore, we show that GRSP can be scaled to high-dimensional settings.

翻译：虽然在合作环境中提出了多种多剂强化学习方法,但很少有人会调查自我感兴趣的学习机构如何在分散的普通游戏中实现相互协调,并在执行过程中将事先培训的政策推广到不合作的反对者。我们在本文件中介绍了普遍风险敏感政策(GRSP )。GRSSP了解了代理者返回的分布情况,并估计了一种动态的风险搜索奖金,以发现危险的协调战略。此外,为避免过度适应培训对手,GRSSP学会了辅助对手模型化任务,以推断反对者的类型和动态改变在执行过程中的相应战略。同时,通过GRSSP培训的代理人可以在培训期间实现相互协调,避免在执行过程中被不合作的反对者所利用。据我们所知,这是在不塑造反对者或奖赏的情况下,学习电磁鹿猎(IS)中的代理人之间协调战略,以及首先考虑执行过程中的概括化。我们还表明,GRSSP可以扩大到高层次的环境。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

具有特殊对称性的稀土-过渡金属配合物的磁各向异性研究

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体ND1基因在Leber遗传性视神经病变中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

胃癌中NKD2基因的甲基化调控和信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

地下管线磁异常三层分量联合反演成像探测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型荧光手性传感器的设计合成与应用

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程解的适定性和粘性消失问题

国家自然科学基金

0+阅读 · 2011年12月31日

纯水介质中识别二元羧酸阴离子的光化学传感体系的构筑及传感性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

内质网应激在视网膜色素变性中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

调谐输出式硅MEMS陀螺确定性误差机理、建模及标定补偿方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

Arxiv

0+阅读 · 2022年11月2日

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2022年10月31日

Representation Learning for General-sum Low-rank Markov Games

Arxiv

0+阅读 · 2022年10月30日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年10月30日

Guiding vector fields for the distributed motion coordination of mobile robots

Arxiv

0+阅读 · 2022年10月30日

One Gradient Frank-Wolfe for Decentralized Online Convex and Submodular Optimization

Arxiv

0+阅读 · 2022年10月30日

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

Arxiv

0+阅读 · 2022年10月29日

Learning Modular Simulations for Homogeneous Systems

Arxiv

0+阅读 · 2022年10月28日

Decentralized Federated Learning via Non-Coherent Over-the-Air Consensus

Arxiv

0+阅读 · 2022年10月27日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

VIP会员

文章信息

相关主题

估计/估计量

知识 (knowledge)

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】基础模型后训练的新方法

欧盟防务准备路线图：目标、冲突与2030之路（附“2030年防务准备路线图”原文）

【AAAI2026】模型不确定性下的在线鲁棒规划：一种基于采样的方法

Transformers 出现以来关系抽取任务的系统综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

Arxiv

0+阅读 · 2022年11月2日

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2022年10月31日

Representation Learning for General-sum Low-rank Markov Games

Arxiv

0+阅读 · 2022年10月30日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年10月30日

Guiding vector fields for the distributed motion coordination of mobile robots

Arxiv

0+阅读 · 2022年10月30日

One Gradient Frank-Wolfe for Decentralized Online Convex and Submodular Optimization

Arxiv

0+阅读 · 2022年10月30日

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

Arxiv

0+阅读 · 2022年10月29日

Learning Modular Simulations for Homogeneous Systems

Arxiv

0+阅读 · 2022年10月28日

Decentralized Federated Learning via Non-Coherent Over-the-Air Consensus

Arxiv

0+阅读 · 2022年10月27日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

相关基金

具有特殊对称性的稀土-过渡金属配合物的磁各向异性研究

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体ND1基因在Leber遗传性视神经病变中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

胃癌中NKD2基因的甲基化调控和信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

地下管线磁异常三层分量联合反演成像探测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型荧光手性传感器的设计合成与应用

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程解的适定性和粘性消失问题

国家自然科学基金

0+阅读 · 2011年12月31日

纯水介质中识别二元羧酸阴离子的光化学传感体系的构筑及传感性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

内质网应激在视网膜色素变性中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

调谐输出式硅MEMS陀螺确定性误差机理、建模及标定补偿方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员