利用进进战略加强学习的学习合成环境 (Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies) - 专知论文

会员服务 ·

0

回合 · 学成 · 环 · Networking · Performer ·

2021 年 2 月 8 日

Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies

翻译：利用进进战略加强学习的学习合成环境

Fabio Ferreira,Thomas Nierhoff,Frank Hutter

This work explores learning agent-agnostic synthetic environments (SEs) for Reinforcement Learning. SEs act as a proxy for target environments and allow agents to be trained more efficiently than when directly trained on the target environment. We formulate this as a bi-level optimization problem and represent an SE as a neural network. By using Natural Evolution Strategies and a population of SE parameter vectors, we train agents in the inner loop on evolving SEs while in the outer loop we use the performance on the target task as a score for meta-updating the SE population. We show empirically that our method is capable of learning SEs for two discrete-action-space tasks (CartPole-v0 and Acrobot-v1) that allow us to train agents more robustly and with up to 60% fewer steps. Not only do we show in experiments with 4000 evaluations that the SEs are robust against hyperparameter changes such as the learning rate, batch sizes and network sizes, we also show that SEs trained with DDQN agents transfer in limited ways to a discrete-action-space version of TD3 and very well to Dueling DDQN.

翻译：这项工作探索学习代理- 不可知合成环境( SES) 以强化学习。 SE 代表目标环境,使代理商能够比在目标环境上直接培训更高效地接受培训。我们将此设计成双级优化问题, 并代表 SE 是一个神经网络。通过使用自然进化战略和SE 参数矢量, 我们用在外环中, 将目标任务中的性能在内部环绕中培训代理商, 作为SE 人口元升级的分数。我们从经验上表明, 我们的方法是能够学习 SE, 进行两个独立的行动空间任务( CartPole- V0 和 Acrobot- v1) 的 SE, 使我们能更强有力地培训代理商, 最多减少60 % 的步骤。我们不仅在4000 的实验中显示, SE 能够抵御超度变化, 如学习率、批量和网络大小等。我们还表明, 受DQN 代理商培训的Ses 能够以有限的方式向TD3 和 CDDDDQ 。

0

相关内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

专知会员服务

138+阅读 · 2020年4月3日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

专知会员服务

213+阅读 · 2020年1月13日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Arxiv

0+阅读 · 2021年3月30日

Distributed learning in congested environments with partial information

Arxiv

0+阅读 · 2021年3月29日

Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot

Arxiv

0+阅读 · 2021年3月29日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

专知会员服务

138+阅读 · 2020年4月3日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

专知会员服务

213+阅读 · 2020年1月13日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Arxiv

0+阅读 · 2021年3月30日

Distributed learning in congested environments with partial information

Arxiv

0+阅读 · 2021年3月29日

Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot

Arxiv

0+阅读 · 2021年3月29日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员