走向实用加强学习:提高普遍化和抽样效率,并配以政策组合 (Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble) - 专知论文

会员服务 ·

0

泛化理论 · 集成 · 优化器 · Performer · 稳健性 ·

2022 年 5 月 19 日

Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

翻译：走向实用加强学习:提高普遍化和抽样效率,并配以政策组合

Zhengyu Yang,Kan Ren,Xufang Luo,Minghuan Liu,Weiqing Liu,Jiang Bian,Weinan Zhang,Dongsheng Li

from arxiv, Accepted in IJCAI 2022. The codes are available at https://seqml.github.io/eppo

It is challenging for reinforcement learning (RL) algorithms to succeed in real-world applications like financial trading and logistic system due to the noisy observation and environment shifting between training and evaluation. Thus, it requires both high sample efficiency and generalization for resolving real-world tasks. However, directly applying typical RL algorithms can lead to poor performance in such scenarios. Considering the great performance of ensemble methods on both accuracy and generalization in supervised learning (SL), we design a robust and applicable method named Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner. Notably, EPPO combines each policy and the policy ensemble organically and optimizes both simultaneously. In addition, EPPO adopts a diversity enhancement regularization over the policy space which helps to generalize to unseen states and promotes exploration. We theoretically prove EPPO increases exploration efficacy, and through comprehensive experimental evaluations on various tasks, we demonstrate that EPPO achieves higher efficiency and is robust for real-world applications compared with vanilla policy optimization algorithms and other ensemble methods. Code and supplemental materials are available at https://seqml.github.io/eppo.

翻译：强化学习(RL)算法在现实世界应用中取得成功是困难的,例如金融贸易和后勤系统,因为培训和评估之间的观测和环境变化十分吵闹。因此,它需要高抽样效率和通用性,以便解决现实世界的任务。然而,直接应用典型的RL算法可能会在这种情景中造成不良的绩效。考虑到在监督学习(SL)中精度和概括化的混合方法的出色表现,我们设计了一种叫作 " 综合的优化政策(EPPO) " 的强有力和适用的方法,该方法以端到端的方式学习共同政策。值得注意的是,EPPO将每一项政策和政策都有机地结合起来,同时优化。此外,EPOPO对政策空间实行多样性加强规范,帮助向看不见的国家推广并促进探索。我们从理论上证明EPPO提高了探索效率,并通过对各项任务进行全面实验性评估,我们证明EPO与香草政策优化算法和其他高效率,对现实世界的应用是有力的。代码和补充材料在http://wwwplipqregio。

0

相关内容

泛化理论

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙池调控钙内流对丁酸钠诱导大肠癌细胞凋亡调控机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-27a/b靶向沉默ABCA1调控胆固醇逆向转运

国家自然科学基金

0+阅读 · 2011年12月31日

5HRE与CEAp联合调控抑癌基因RASSF1A系统治疗CEA阳性肿瘤的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

低温弱光下黄瓜Rubisco活化酶基因过量表达效应的研究

国家自然科学基金

0+阅读 · 2009年12月31日

超声造影微血管显像与乳腺癌血管生成和血管内皮生长因子（VEGF）表达的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

幽门螺杆菌益生菌型口服疫苗的研制

国家自然科学基金

0+阅读 · 2008年12月31日

二氧化硫胁迫诱发的拟南芥DNA甲基化修饰及转录调控作用

国家自然科学基金

0+阅读 · 2008年12月31日

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Safe reinforcement learning for multi-energy management systems with known constraint functions

Arxiv

0+阅读 · 2022年7月8日

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月8日

Stochastic optimal well control in subsurface reservoirs using reinforcement learning

Arxiv

0+阅读 · 2022年7月7日

Retro-RL: Reinforcing Nominal Controller With Deep Reinforcement Learning for Tilting-Rotor Drones

Arxiv

0+阅读 · 2022年7月7日

Energy-based Legged Robots Terrain Traversability Modeling via Deep Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月7日

Two-Sample Testing in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月6日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Safe reinforcement learning for multi-energy management systems with known constraint functions

Arxiv

0+阅读 · 2022年7月8日

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月8日

Stochastic optimal well control in subsurface reservoirs using reinforcement learning

Arxiv

0+阅读 · 2022年7月7日

Retro-RL: Reinforcing Nominal Controller With Deep Reinforcement Learning for Tilting-Rotor Drones

Arxiv

0+阅读 · 2022年7月7日

Energy-based Legged Robots Terrain Traversability Modeling via Deep Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月7日

Two-Sample Testing in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月6日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙池调控钙内流对丁酸钠诱导大肠癌细胞凋亡调控机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-27a/b靶向沉默ABCA1调控胆固醇逆向转运

国家自然科学基金

0+阅读 · 2011年12月31日

5HRE与CEAp联合调控抑癌基因RASSF1A系统治疗CEA阳性肿瘤的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

低温弱光下黄瓜Rubisco活化酶基因过量表达效应的研究

国家自然科学基金

0+阅读 · 2009年12月31日

超声造影微血管显像与乳腺癌血管生成和血管内皮生长因子（VEGF）表达的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

幽门螺杆菌益生菌型口服疫苗的研制

国家自然科学基金

0+阅读 · 2008年12月31日

二氧化硫胁迫诱发的拟南芥DNA甲基化修饰及转录调控作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员