在双层阿塔里运动会中找到不爆炸战略的深入强化学习方法</s> (A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games) - 专知论文

会员服务 ·

0

Learning · Atari · Markov · 深度强化学习 · Agent ·

2023 年 3 月 7 日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

翻译：在双层阿塔里运动会中找到不爆炸战略的深入强化学习方法

Zihan Ding,Dijia Su,Qinghua Liu,Chi Jin

This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Different from prior efforts on training agents to beat a fixed set of opponents, our objective is to find the Nash equilibrium policies that are free from exploitation by even the adversarial opponents. We propose (a) Nash-DQN algorithm, which integrates the deep learning techniques from single DQN into the classic Nash Q-learning algorithm for solving tabular Markov games; (b) Nash-DQN-Exploiter algorithm, which additionally adopts an exploiter to guide the exploration of the main agent. We conduct experimental evaluation on tabular examples as well as various two-player Atari games. Our empirical results demonstrate that (i) the policies found by many existing methods including Neural Fictitious Self Play and Policy Space Response Oracle can be prone to exploitation by adversarial opponents; (ii) the output policies of our algorithms are robust to exploitation, and thus outperform existing methods.

翻译：本文为学习双玩者零和马尔科夫游戏提出了新的、端到端的强化深层学习算法。与以前训练代理人击败一组固定对手的努力不同,我们的目标是找到甚至没有敌对对手利用的纳什平衡政策。我们提议:(a) Nash-DQN 算法,将单一DQN的深层学习技巧纳入传统的纳什Q-学习算法,用于解决表格马科夫游戏;(b) Nash-DQN-Expliter 算法,该算法进一步采用一个剥削者来指导主要代理人的探索。我们对表格范例以及各种双玩家阿塔里游戏进行实验性评价。我们的经验结果表明,(i) 许多现有方法(包括神经自律游戏和政策空间反应奥雷奇)发现的政策很容易被对抗对手利用;(ii) 我们的算法输出政策坚固可加以利用,从而超越现有方法。</s>

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

神经系统seipin缺失诱发精神迟滞的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

深度学习理论及在图像识别中的应用研究

国家自然科学基金

6+阅读 · 2012年12月31日

SiC/Ti基复合材料中Ti基体超细晶的形成和强化机理

国家自然科学基金

0+阅读 · 2012年12月31日

Delta 5 Stat5a与乳腺癌: Delta 5 Stat5a的全基因组结合位点分析及其表观基因组学研究

国家自然科学基金

0+阅读 · 2011年12月31日

关于1-Laplace型方程与平均曲率型方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

Arxiv

0+阅读 · 2023年4月27日

Semantic Exploration from Language Abstractions and Pretrained Representations

Arxiv

0+阅读 · 2023年4月26日

Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2023年4月26日

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Arxiv

0+阅读 · 2023年4月25日

A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation

Arxiv

0+阅读 · 2023年4月25日

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

Arxiv

0+阅读 · 2023年4月25日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

深度强化学习

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

Arxiv

0+阅读 · 2023年4月27日

Semantic Exploration from Language Abstractions and Pretrained Representations

Arxiv

0+阅读 · 2023年4月26日

Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2023年4月26日

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Arxiv

0+阅读 · 2023年4月25日

A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation

Arxiv

0+阅读 · 2023年4月25日

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

Arxiv

0+阅读 · 2023年4月25日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

神经系统seipin缺失诱发精神迟滞的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

深度学习理论及在图像识别中的应用研究

国家自然科学基金

6+阅读 · 2012年12月31日

SiC/Ti基复合材料中Ti基体超细晶的形成和强化机理

国家自然科学基金

0+阅读 · 2012年12月31日

Delta 5 Stat5a与乳腺癌: Delta 5 Stat5a的全基因组结合位点分析及其表观基因组学研究

国家自然科学基金

0+阅读 · 2011年12月31日

关于1-Laplace型方程与平均曲率型方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员