模型无对手方形状 (Model-Free Opponent Shaping) - 专知论文

会员服务 ·

0

学成 · 塑造 · 知识 (knowledge) · INTERACT · 白盒 ·

2022 年 5 月 3 日

Model-Free Opponent Shaping

翻译：模型无对手方形状

Chris Lu,Timon Willi,Christian Schroeder de Witt,Jakob Foerster

In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.

翻译：在一般游戏中,自我感兴趣的学习代理人的相互作用通常导致集体最坏的结果,如在迭代囚犯的两难困境(IPD)中出现缺陷。为了克服这一点,我们建议采用一些方法,例如“学习与积极学习意识(LOLA) ” 等方法,来塑造对手的学习过程。然而,这些方法是短视的,因为可以预见到只有一小部分步骤,它们不对称,因为它们把其他代理人当作天真的学习者,并且需要使用高阶衍生品,这些衍生品通过白箱访问对方不同的学习算法来计算。为了解决这些问题,我们建议采用“无偏向偏向偏向偏向” 。MFOS在元游戏中学习一个元性游戏,每个元性步骤都是基础游戏(“内”)的一个插曲。元状态是由内部政策构成的,而元政策产生一种新的内部政策,将在下一个插曲中使用。M-FOS然后使用通用的无型优化模式方法来学习能够完成长期的元政策。为了解决这些问题,我们建议,M-FO-FE-FO在最接近于最高级的游戏中,在最高级的游戏中,在最高级的游戏中,在最高级的学习中可以展示中学习。M-FO-FI-toimal-toimal-toimal-h-h-h-h-h-h-h-h-his-h-his-hisma-hisma-hisal-his-his-hism-to-to-to-h-h-h-h-in-h-h-how-h-h-h-in-in-in-h-h-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-h-h-h-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-in-

0

相关内容

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

基于GNSS的高速列车多源信息融合定位模型及其RAMS评估研究

国家自然科学基金

0+阅读 · 2014年12月31日

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

多离合器ISG混合动力汽车分层多模式切换协调控制与优化

国家自然科学基金

1+阅读 · 2014年12月31日

HIV-1逆转录酶和整合酶双靶点抑制剂BPDKAs类似物的分子设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

复杂荷载作用下各向异性沥青路面粘弹塑性变形的理论研究与数值实现

国家自然科学基金

0+阅读 · 2013年12月31日

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

植物蛋白泛素化动态修饰的定量分析

国家自然科学基金

0+阅读 · 2011年12月31日

极区大气风场卫星反演及误差订正方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下半导体量子点发热特性的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Learning to Share in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Metric Policy Representations for Opponent Modeling

Arxiv

0+阅读 · 2022年6月21日

Model-Based Opponent Modeling

Arxiv

0+阅读 · 2022年6月21日

Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

Arxiv

0+阅读 · 2022年6月21日

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Arxiv

0+阅读 · 2022年6月20日

Off-Beat Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

A Simple Guard for Learned Optimizers

Arxiv

0+阅读 · 2022年6月17日

Learning Minimum-Time Flight in Cluttered Environments

Arxiv

0+阅读 · 2022年6月17日

Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Near-Optimal No-Regret Learning for General Convex Games

Arxiv

0+阅读 · 2022年6月17日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Learning to Share in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Metric Policy Representations for Opponent Modeling

Arxiv

0+阅读 · 2022年6月21日

Model-Based Opponent Modeling

Arxiv

0+阅读 · 2022年6月21日

Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

Arxiv

0+阅读 · 2022年6月21日

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Arxiv

0+阅读 · 2022年6月20日

Off-Beat Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

A Simple Guard for Learned Optimizers

Arxiv

0+阅读 · 2022年6月17日

Learning Minimum-Time Flight in Cluttered Environments

Arxiv

0+阅读 · 2022年6月17日

Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Near-Optimal No-Regret Learning for General Convex Games

Arxiv

0+阅读 · 2022年6月17日

相关基金

基于GNSS的高速列车多源信息融合定位模型及其RAMS评估研究

国家自然科学基金

0+阅读 · 2014年12月31日

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

多离合器ISG混合动力汽车分层多模式切换协调控制与优化

国家自然科学基金

1+阅读 · 2014年12月31日

HIV-1逆转录酶和整合酶双靶点抑制剂BPDKAs类似物的分子设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

复杂荷载作用下各向异性沥青路面粘弹塑性变形的理论研究与数值实现

国家自然科学基金

0+阅读 · 2013年12月31日

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

植物蛋白泛素化动态修饰的定量分析

国家自然科学基金

0+阅读 · 2011年12月31日

极区大气风场卫星反演及误差订正方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下半导体量子点发热特性的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员