具有理论支持的样本再利用的通用政策改进比值 (Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse) - 专知论文

会员服务 ·

0

策略改进 · Performer · 样本 · Extensibility · Analysis ·

2022 年 6 月 28 日

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

翻译：具有理论支持的样本再利用的通用政策改进比值

James Queeney,Ioannis Ch. Paschalidis,Christos G. Cassandras

Real-world sequential decision making requires data-driven algorithms that provide practical guarantees on performance throughout training while also making efficient use of data. Model-free deep reinforcement learning represents a framework for such data-driven decision making, but existing algorithms typically only focus on one of these goals while sacrificing performance with respect to the other. On-policy algorithms guarantee policy improvement throughout training but suffer from high sample complexity, while off-policy algorithms make efficient use of data through sample reuse but lack theoretical guarantees. In order to balance these competing goals, we develop a class of Generalized Policy Improvement algorithms that combines the policy improvement guarantees of on-policy methods with the efficiency of theoretically supported sample reuse. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a variety of continuous control tasks from the DeepMind Control Suite.

翻译：现实世界顺序决策需要数据驱动算法,这种算法为整个培训期间的业绩提供实际保障,同时有效地使用数据。无模型深度强化学习是这种数据驱动决策的框架,但现有的算法通常只注重其中一个目标,而牺牲另一个目标的绩效。在线政策算法保证了整个培训期间的政策改进,但具有很高的抽样复杂性,而非政策算法则通过抽样再利用有效利用数据,但却缺乏理论保障。为了平衡这些相互竞争的目标,我们制定了一类通用政策改进算法,将政策改进保证与理论上支持的抽样再利用效率相结合。我们通过对深海控制套件的各种连续控制任务进行广泛的实验分析,展示了这种新型算法的好处。

0

相关内容

策略改进

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Y2Ti2O7纳米颗粒对含铝ODS钢微观结构和力学行为的作用机理

国家自然科学基金

0+阅读 · 2015年12月31日

樟疫霉致病性相关GPCR-PIPK鉴定与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

MTRR基因Ile22Met多态性在胃癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

云存储系统安全关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

呼伦贝尔羊草草甸草原CO2、CH4 和N2O 通量对放牧强度的响应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

13q染色体末端先天性心脏病致病基因的鉴定及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

高煤级煤层气储层应力敏感性及控制机理

国家自然科学基金

0+阅读 · 2011年12月31日

A Risk-Sensitive Approach to Policy Optimization

Arxiv

0+阅读 · 2022年8月19日

Berry-Esseen Theorem for Sample Quantiles with Locally Dependent Data

Arxiv

0+阅读 · 2022年8月18日

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月18日

Learning-based estimation of in-situ wind speed from underwater acoustics

Arxiv

0+阅读 · 2022年8月18日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Arxiv

0+阅读 · 2022年8月18日

Numerical analysis for electromagnetic scattering from nonlinear boundary conditions

Arxiv

0+阅读 · 2022年8月17日

Further Improvements for SAT in Terms of Formula Length

Arxiv

0+阅读 · 2022年8月17日

A sufficient and necessary condition for identification of binary choice models with fixed effects

Arxiv

0+阅读 · 2022年8月16日

Undersampling Raster Scans in Spectromicroscopy for reduced dose and faster measurements

Arxiv

0+阅读 · 2022年8月15日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Risk-Sensitive Approach to Policy Optimization

Arxiv

0+阅读 · 2022年8月19日

Berry-Esseen Theorem for Sample Quantiles with Locally Dependent Data

Arxiv

0+阅读 · 2022年8月18日

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月18日

Learning-based estimation of in-situ wind speed from underwater acoustics

Arxiv

0+阅读 · 2022年8月18日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Arxiv

0+阅读 · 2022年8月18日

Numerical analysis for electromagnetic scattering from nonlinear boundary conditions

Arxiv

0+阅读 · 2022年8月17日

Further Improvements for SAT in Terms of Formula Length

Arxiv

0+阅读 · 2022年8月17日

A sufficient and necessary condition for identification of binary choice models with fixed effects

Arxiv

0+阅读 · 2022年8月16日

Undersampling Raster Scans in Spectromicroscopy for reduced dose and faster measurements

Arxiv

0+阅读 · 2022年8月15日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

Y2Ti2O7纳米颗粒对含铝ODS钢微观结构和力学行为的作用机理

国家自然科学基金

0+阅读 · 2015年12月31日

樟疫霉致病性相关GPCR-PIPK鉴定与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

MTRR基因Ile22Met多态性在胃癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Al-Cr-Si系中十次准晶体原位三维晶体结构的电子断层成像三维重构

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

云存储系统安全关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

呼伦贝尔羊草草甸草原CO2、CH4 和N2O 通量对放牧强度的响应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

13q染色体末端先天性心脏病致病基因的鉴定及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

高煤级煤层气储层应力敏感性及控制机理

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员