具有几何政策构成的一般政策改进 (Generalised Policy Improvement with Geometric Policy Composition) - 专知论文

会员服务 ·

0

策略改进 · Markov · Analysis · Learning · 基 ·

2022 年 6 月 17 日

Generalised Policy Improvement with Geometric Policy Composition

翻译：具有几何政策构成的一般政策改进

Shantanu Thakoor,Mark Rowland,Diana Borsa,Will Dabney,Rémi Munos,André Barreto

from arxiv, ICML 2022

We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate any non-Markov policy that switches between a set of base Markov policies with fixed probability by a careful composition of the base policy GHMs, without any additional learning. We can then apply generalised policy improvement (GPI) to collections of such non-Markov policies to obtain a new Markov policy that will in general outperform its precursors. We provide a thorough theoretical analysis of this approach, develop applications to transfer and standard RL, and empirically demonstrate its effectiveness over standard GPI on a challenging deep RL continuous control task. We also provide an analysis of GHM training methods, proving a novel convergence result regarding previously proposed methods and showing how to train these models stably in deep RL settings.

翻译：我们引入了一种政策改进方法,将基于价值的强化学习(RL)的贪婪方法与典型基于模型的强化学习(RL)的全面规划方法相交织。新的方法以几何地平面模型(GHM,又称伽马模型)的概念为基础,该模型模拟了某一政策的折扣国家访问分布。我们表明,我们可以通过审慎地组成基础政策GHM,来评估在一组基点马可夫政策之间转换固定概率的非马尔科夫政策,而无需再加任何学习。然后,我们可以对此类非马尔科夫政策集采用一般化的政策改进(GPI),以获得新的Markov政策,该政策将全面超越其前体。我们对这一方法进行透彻的理论分析,开发转让应用程序和标准RL标准,并用经验证明它在一项具有挑战性的深RL连续控制任务方面对标准GPI的有效性。我们还分析了GM培训方法,证明以前提出的方法的新趋同结果,并展示如何在深度的RL环境中对这些模型进行精确的培训。

0

相关内容

策略改进

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

暗分子云的精细结构、磁场和演化：HI窄线自吸收的高分辨率观测

国家自然科学基金

0+阅读 · 2013年12月31日

基于晶格点缺陷的二维Frenkel-Kontorova模型耗散动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

超冷原子气体在无序晶格中相变和相干动力学特性

国家自然科学基金

0+阅读 · 2012年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

La基稀土高介电系数(k)栅介质材料的同步辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型半导体沟道材料与高介电栅介质薄膜的界面调控与电学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improved Rates of Bootstrap Approximation for the Operator Norm: A Coordinate-Free Approach

Arxiv

0+阅读 · 2022年8月5日

A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization

A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization

Arxiv

0+阅读 · 2022年8月4日

ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Arxiv

0+阅读 · 2022年8月4日

FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning

Arxiv

0+阅读 · 2022年8月4日

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

Arxiv

0+阅读 · 2022年8月3日

Towards Global Optimality in Cooperative MARL with Sequential Transformation

Arxiv

0+阅读 · 2022年8月2日

Variance-based sensitivity analysis for weighting estimators result in more informative bounds

Arxiv

0+阅读 · 2022年8月2日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Improved Rates of Bootstrap Approximation for the Operator Norm: A Coordinate-Free Approach

Arxiv

0+阅读 · 2022年8月5日

A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization

A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization

Arxiv

0+阅读 · 2022年8月4日

ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Arxiv

0+阅读 · 2022年8月4日

FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning

Arxiv

0+阅读 · 2022年8月4日

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

Arxiv

0+阅读 · 2022年8月3日

Towards Global Optimality in Cooperative MARL with Sequential Transformation

Arxiv

0+阅读 · 2022年8月2日

Variance-based sensitivity analysis for weighting estimators result in more informative bounds

Arxiv

0+阅读 · 2022年8月2日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

暗分子云的精细结构、磁场和演化：HI窄线自吸收的高分辨率观测

国家自然科学基金

0+阅读 · 2013年12月31日

基于晶格点缺陷的二维Frenkel-Kontorova模型耗散动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

超冷原子气体在无序晶格中相变和相干动力学特性

国家自然科学基金

0+阅读 · 2012年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

La基稀土高介电系数(k)栅介质材料的同步辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型半导体沟道材料与高介电栅介质薄膜的界面调控与电学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员