平行逆向课程编制 (Parallelized Reverse Curriculum Generation) - 专知论文

会员服务 ·

0

可交换的 · Performer · 泛化理论 · 学成 · Pair ·

2021 年 8 月 4 日

Parallelized Reverse Curriculum Generation

翻译：平行逆向课程编制

Zih-Yun Chiu,Yi-Lin Tuan,Hung-yi Lee,Li-Chen Fu

For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards. To solve this problem, reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn. More specifically, RCG adapts the initial state distribution from the neighborhood of a goal to a distance as training proceeds. However, the initial state distribution generated for each iteration might be biased, thus making the policy overfit or slowing down the reverse expansion rate. While training RCG for actor-critic (AC) based RL algorithms, this poor generalization and slow convergence might be induced by the tight coupling between an AC pair. Therefore, we propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics. We empirically demonstrate that this proposed approach can improve RCG in performance and convergence, and it can also be applied to other AC based RL algorithms with adapted initial state distribution.

翻译：对于强化学习(RL)来说,对于代理人来说,要完成一项需要因微薄的回报而采取一系列特定行动的任务是具有挑战性的。为了解决这个问题,反向课程生成(RCG)提供了一种反向扩展方法,自动生成代理学习的课程。更具体地说,RCG将目标周围最初的州分布调整为距离,随着培训的进行。然而,每次迭代最初产生的州分布可能存在偏向性,从而使政策过度适应或减缓反向扩展率。虽然为基于演员-批评(AC)的RL算法培训RCG,但这种不完善的概括化和缓慢趋同可能是由于对AC的对配对之间的紧密结合引起的。因此,我们提出了一种平行方法,即同时培训多个AC对配方并定期交换其批评者。我们从经验上证明,这一拟议方法可以提高REG的性能和趋同性,也可以适用于基于AC的其他基于演员的RL算法,并经过调整的初始状态分布。

0

相关内容

可交换的

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Large Batch Experience Replay

Arxiv

0+阅读 · 2021年10月4日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

0+阅读 · 2021年10月4日

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Arxiv

3+阅读 · 2021年3月29日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

Denoising Adversarial Autoencoders

Arxiv

9+阅读 · 2018年1月4日

A Unified approach for Conventional Zero-shot, Generalized Zero-shot and Few-shot Learning

Arxiv

4+阅读 · 2017年10月26日

VIP会员

文章信息

相关主题

相关VIP内容

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Large Batch Experience Replay

Arxiv

0+阅读 · 2021年10月4日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

0+阅读 · 2021年10月4日

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Arxiv

3+阅读 · 2021年3月29日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

Denoising Adversarial Autoencoders

Arxiv

9+阅读 · 2018年1月4日

A Unified approach for Conventional Zero-shot, Generalized Zero-shot and Few-shot Learning

Arxiv

4+阅读 · 2017年10月26日

微信扫码咨询专知VIP会员