带有内观加权的非政策动画-批评 (Off-Policy Actor-Critic with Emphatic Weightings) - 专知论文

会员服务 ·

0

Weight · Performer · 近似 · 方差减小 · 情景 ·

2022 年 8 月 11 日

Off-Policy Actor-Critic with Emphatic Weightings

翻译：带有内观加权的非政策动画-批评

Eric Graves,Ehsan Imani,Raksha Kumaraswamy,Martha White

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off-policy setting, however, has been less clear due to the existence of multiple objectives and the lack of an explicit off-policy policy gradient theorem. In this work, we unify these objectives into one off-policy objective, and provide a policy gradient theorem for this unified objective. The derivation involves emphatic weightings and interest functions. We show multiple strategies to approximate the gradients, in an algorithm called Actor Critic with Emphatic weightings (ACE). We prove in a counterexample that previous (semi-gradient) off-policy actor-critic methods--particularly OffPAC and DPG--converge to the wrong solution whereas ACE finds the optimal solution. We also highlight why these semi-gradient approaches can still perform well in practice, suggesting strategies for variance reduction in ACE. We empirically study several variants of ACE on two classic control environments and an image-based environment designed to illustrate the tradeoffs made by each gradient approximation. We find that by approximating the emphatic weightings directly, ACE performs as well as or better than OffPAC in all settings tested.

翻译：由于政策梯度定理为梯度提供了简化的形式,因此在政策环境中存在着各种理论上健全的政策梯度算法。但是,由于存在多重目标和缺乏明确的非政策性政策梯度定理,离政策环境不太清楚。在这项工作中,我们将这些目标统一为一个离政策目标,并为这一统一目标提供一个政策梯度定理。从中得出的功能包括强调权重和兴趣功能。我们在一种称为“Acloor Critic”的算法中展示了接近梯度的多种战略,即“Acor Critical critication ” (ACE) 。我们证明,由于存在多种目标,而且缺乏明确的非政策性政策性政策梯度定律,因此离政策性环境的设置不那么清晰。我们把这些目标统一为一个离政策目标,并为这一统一的目标提供一个政策梯度定律。我们还强调,这些半梯度方法在实际中仍然能够很好地发挥作用,提出减少差异的战略。我们从经验上研究了ACE的两个典型的控制环境和基于图像的环境的几种变式。我们用一个反向着的模型来说明ACEB度,我们用每个梯度测试了每个基的比A级的比ACEBA的压力来更好地展示。

0

相关内容

Weight

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

成团泛菌介导下的金还原过程及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

PICK1在脑内氧化应激损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

支撑二元过渡金属团簇磁各向异性能的调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

人源PCL家族蛋白参与表观遗传调控的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

函数方程,复微分方程与差分方程

国家自然科学基金

0+阅读 · 2011年12月31日

转铁蛋白偶联的超顺磁性-荧光纳米探针双重靶向标记脑胶质瘤的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Self-supervised learning with rotation-invariant kernels

Arxiv

0+阅读 · 2022年10月5日

FedMT: Federated Learning with Mixed-type Labels

Arxiv

0+阅读 · 2022年10月5日

Machine Unlearning of Features and Labels

Arxiv

0+阅读 · 2022年10月4日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Arxiv

0+阅读 · 2022年10月4日

Policy Gradient for Reinforcement Learning with General Utilities

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Deep Recurrent Q-learning for Energy-constrained Coverage with a Mobile Robot

Arxiv

0+阅读 · 2022年10月1日

Predictive Inference with Feature Conformal Prediction

Arxiv

0+阅读 · 2022年10月1日

Policy iteration method for time-dependent Mean Field Games systems with non-separable Hamiltonians

Arxiv

0+阅读 · 2022年9月30日

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Arxiv

0+阅读 · 2022年9月29日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

量子计算发展态势研究报告（2025年）

Video-LMM后训练：多模态大模型的视频推理深度解析

【CMU博士论文】用于提升含优化层学习的算法与体系结构

【NeurIPS2025】有何不同于过去？基于自监督偏差学习的时空时间序列预测

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Self-supervised learning with rotation-invariant kernels

Arxiv

0+阅读 · 2022年10月5日

FedMT: Federated Learning with Mixed-type Labels

Arxiv

0+阅读 · 2022年10月5日

Machine Unlearning of Features and Labels

Arxiv

0+阅读 · 2022年10月4日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Arxiv

0+阅读 · 2022年10月4日

Policy Gradient for Reinforcement Learning with General Utilities

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Deep Recurrent Q-learning for Energy-constrained Coverage with a Mobile Robot

Arxiv

0+阅读 · 2022年10月1日

Predictive Inference with Feature Conformal Prediction

Arxiv

0+阅读 · 2022年10月1日

Policy iteration method for time-dependent Mean Field Games systems with non-separable Hamiltonians

Arxiv

0+阅读 · 2022年9月30日

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Arxiv

0+阅读 · 2022年9月29日

相关基金

成团泛菌介导下的金还原过程及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

PICK1在脑内氧化应激损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

支撑二元过渡金属团簇磁各向异性能的调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

人源PCL家族蛋白参与表观遗传调控的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

函数方程,复微分方程与差分方程

国家自然科学基金

0+阅读 · 2011年12月31日

转铁蛋白偶联的超顺磁性-荧光纳米探针双重靶向标记脑胶质瘤的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员