多样性引导的自博弈环境设计 (Diversity Induced Environment Design via Self-Play) - 专知论文

会员服务 ·

0

回合 · Self-Play · 多样性 · 设计 · Agent ·

2023 年 3 月 21 日

Diversity Induced Environment Design via Self-Play

翻译：多样性引导的自博弈环境设计

Dexun Li,Wenjun Li,Pradeep Varakantham

Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.

翻译：最近，有关设计适当的环境分布的工作已经展现出培养高效通用代理的前景。其成功部分取决于一种自适应课程学习形式，该形式在代理的能力边界上生成环境实例（或级别）。然而，这种环境设计框架经常在有挑战的设计空间中找到有效的级别，并且需要与环境进行昂贵的交互，因而缺乏应用性。在本文中，我们旨在引入Unsupervised Environment Design（UED）框架中的多样性。具体而言，我们提出了一种任务不可知的方法来识别表示给定级别的观察/隐藏状态。我们的方法的结果是表征两个级别之间多样性的有效方法，正如我们所展示的可能对性能有关键影响。此外还为了提高采样效率，我们还把自博弈技术结合到了其中，以使环境生成器能够自动产生对培训代理非常有益的环境。定量地说，我们的方法，即Diversity-induced Environment Design via Self-Play（“DivSP”），与现有方法相比表现出令人信服的性能。

0

相关内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

167+阅读 · 2022年4月10日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【ETH、Stanford】基于博弈论的运动规划，Tutorial ICRA '21

【ETH、Stanford】基于博弈论的运动规划，Tutorial ICRA '21

专知会员服务

56+阅读 · 2022年3月7日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

决策行为对生态系统服务管理影响机制的计算研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于前景理论与公平偏好的合作创新研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mo(W)-S-Cu原子簇-有机框架材料的设计合成及异相催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于演化博弈的服务系统全生命周期性能演进方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于合同匹配理论的流域水市场机制设计

国家自然科学基金

0+阅读 · 2012年12月31日

社会网络环境下机制设计理论及关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

普适环境下RFID高效隐私保护认证协议及模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

非完全信任供应链调度的扩展RTN模型与协同演化算法

国家自然科学基金

0+阅读 · 2009年12月31日

基于胍双官能团有机催化剂的设计、合成及催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

海洋天然产物Eudistomin衍生物的设计、合成及抗乙肝病毒构效关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

Path and trajectory planning of a tethered UAV-UGV marsupial robotic system

Arxiv

0+阅读 · 2023年5月11日

Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers

Arxiv

0+阅读 · 2023年5月10日

Knowing Who Knows What: Designing Socially Assistive Robots with Transactive Memory System

Arxiv

0+阅读 · 2023年5月9日

ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

Arxiv

0+阅读 · 2023年5月9日

Knowledge-enhanced Agents for Interactive Text Games

Arxiv

0+阅读 · 2023年5月8日

Information Design in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月8日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Arxiv

15+阅读 · 2019年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

167+阅读 · 2022年4月10日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【ETH、Stanford】基于博弈论的运动规划，Tutorial ICRA '21

【ETH、Stanford】基于博弈论的运动规划，Tutorial ICRA '21

专知会员服务

56+阅读 · 2022年3月7日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

网络科学赋能人工智能: 现状与展望

【NeurIPS2025教程】解释人工智能模型：可解释人工智能、数据中心人工智能与机制可解释性的方法与机遇

人工智能赋能作战行动：以俄乌战争为例

【ETHZ博士论文】表征学习在推进深度学习中的作用：效率、可扩展性与推理

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Path and trajectory planning of a tethered UAV-UGV marsupial robotic system

Arxiv

0+阅读 · 2023年5月11日

Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers

Arxiv

0+阅读 · 2023年5月10日

Knowing Who Knows What: Designing Socially Assistive Robots with Transactive Memory System

Arxiv

0+阅读 · 2023年5月9日

ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

Arxiv

0+阅读 · 2023年5月9日

Knowledge-enhanced Agents for Interactive Text Games

Arxiv

0+阅读 · 2023年5月8日

Information Design in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月8日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Arxiv

15+阅读 · 2019年1月15日

相关基金

决策行为对生态系统服务管理影响机制的计算研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于前景理论与公平偏好的合作创新研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mo(W)-S-Cu原子簇-有机框架材料的设计合成及异相催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于演化博弈的服务系统全生命周期性能演进方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于合同匹配理论的流域水市场机制设计

国家自然科学基金

0+阅读 · 2012年12月31日

社会网络环境下机制设计理论及关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

普适环境下RFID高效隐私保护认证协议及模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

非完全信任供应链调度的扩展RTN模型与协同演化算法

国家自然科学基金

0+阅读 · 2009年12月31日

基于胍双官能团有机催化剂的设计、合成及催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

海洋天然产物Eudistomin衍生物的设计、合成及抗乙肝病毒构效关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员