ERMAAS: 成为多代理模拟中回报函数的正向差距 (ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations) - 专知论文

会员服务 ·

0

稳健性 · 奖励函数 · 泛函 · 优化器 · 学成 ·

2021 年 6 月 10 日

ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

翻译：ERMAAS: 成为多代理模拟中回报函数的正向差距

Eric Zhao,Alexander R. Trott,Caiming Xiong,Stephan Zheng

Multi-agent simulations provide a scalable environment for learning policies that interact with rational agents. However, such policies may fail to generalize to the real-world where agents may differ from simulated counterparts due to unmodeled irrationality and misspecified reward functions. We introduce Epsilon-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps. While existing notions of multi-agent robustness concern perturbations in the actions of agents, we address a novel robustness objective concerning perturbations in the reward functions of agents. ERMAS provides this robustness by anticipating suboptimal behaviors from other agents, formalized as the worst-case epsilon-equilibrium. We show empirically that ERMAS yields robust policies for repeated bimatrix games and optimal taxation problems in economic simulations. In particular, in the two-level RL problem posed by the AI Economist (Zheng et al., 2020) ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complex spatiotemporal simulations.

翻译：多试剂模拟为学习与理性剂互动的政策提供了一个可伸缩的环境。然而,这种政策可能无法向现实世界推广,因为未经改造的非非理性和错误描述的奖励功能,代理可能不同于模拟的对应方。我们引入了Epsilon-Robust多种代理模拟(ERMAS),这是一个强有力的优化框架,用于学习对多试剂模拟至现实差距具有强大作用的AI政策。虽然现有的多试剂稳健性概念涉及代理方行为中的干扰,但我们解决了有关代理方报酬功能的干扰的新颖稳健性目标。机构风险管理机构通过预测其他代理方的次优行为提供了这种稳健性,作为最坏的例态的epsilon-equilibrium正式化。我们从经验上表明,机构风险管理为反复的双轨游戏和经济模拟中的最佳税收问题提供了强有力的政策。特别是,在AI Econsist(Zheng等人,2020年)提出的两级RL问题中,机构风险管理学会了对代理方风险转换的严格税政策,通过15 %的复杂模拟改善社会福利。

0

相关内容

稳健性

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Arxiv

0+阅读 · 2021年8月5日

Robust Behavioral Cloning for Autonomous Vehicles using End-to-End Imitation Learning

Robust Behavioral Cloning for Autonomous Vehicles using End-to-End Imitation Learning

Arxiv

0+阅读 · 2021年8月5日

On the Robustness of Controlled Deep Reinforcement Learning for Slice Placement

On the Robustness of Controlled Deep Reinforcement Learning for Slice Placement

Arxiv

0+阅读 · 2021年8月5日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

On Improving Decentralized Hysteretic Deep Reinforcement Learning

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月15日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Arxiv

0+阅读 · 2021年8月5日

Robust Behavioral Cloning for Autonomous Vehicles using End-to-End Imitation Learning

Robust Behavioral Cloning for Autonomous Vehicles using End-to-End Imitation Learning

Arxiv

0+阅读 · 2021年8月5日

On the Robustness of Controlled Deep Reinforcement Learning for Slice Placement

On the Robustness of Controlled Deep Reinforcement Learning for Slice Placement

Arxiv

0+阅读 · 2021年8月5日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

On Improving Decentralized Hysteretic Deep Reinforcement Learning

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月15日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员