深强化学习中的非政策行为者-批评者相对重要性抽样 (Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning) - 专知论文

会员服务 ·

0

重要性采样 · Learning · 样本 · 评论员 · Networking ·

2022 年 12 月 22 日

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

翻译：深强化学习中的非政策行为者-批评者相对重要性抽样

Mahammad Humayoo,Xueqi Cheng

Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions. The discrepancy between $\pi$ and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter $\beta\in[0, 1]$ which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy ($\pi$) using samples drawn from behavior policy. We use action value generated from the behavior policy in reward function to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines.

翻译：与强化学习的政策性学习相比,非政策性学习更加不稳定。政策性学习不稳定的原因之一是目标($pi$)与行为(b)政策分布之间的差异。使用重要性抽样(IS)的平稳变量(RIS),例如相对重要性抽样(RIS),可以缓解美元和b分布之间的差异。RIS拥有控制平稳的参数$\beta\in[0,1]美元。为了应对不稳定性,我们在RL中展示了第一个相对重要的非政策性行为者-critic(RIS-off-critic(RIS-PAC)模型型算法。在我们的方法中,网络产生一个目标政策(行为者),一个价值函数(批评家),用行为政策样本评估当前政策($\pi$),来评估当前政策的价值。我们用行为政策产生的行动价值来培训我们的算法,而不是目标性政策。我们还利用深神经网络来培训演员和评论家。我们评估了我们关于公开AI Gym基准问题的一些算法,并展示好或可比的成绩。

0

相关内容

重要性采样

重要性采样

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

GMAW-P熔池表面动态行为和熔透实时控制机理及策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

柔性工序选择的混合流水车间调度及其离散群智能算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

铣削刀具磨损自适应建模与参数辨识策略研究

国家自然科学基金

1+阅读 · 2014年12月31日

飞秒时间尺度研究氟利昂在紫外辐射下的解离动力学

国家自然科学基金

0+阅读 · 2013年12月31日

基于强化学习的前列腺癌蛋白质间相互作用网络的模型及方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Caco-2细胞模型的蛋清肽结构与完整吸收关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

油菜BnICE1基因与MAP激酶信号途径在调控植物耐寒性中的相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于微泡浮选的多流态梯级强化油水分离研究

国家自然科学基金

0+阅读 · 2009年12月31日

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

Arxiv

0+阅读 · 2023年2月27日

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Arxiv

0+阅读 · 2023年2月27日

Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms

Arxiv

0+阅读 · 2023年2月23日

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Arxiv

0+阅读 · 2023年2月23日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

重要性采样

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

Arxiv

0+阅读 · 2023年2月27日

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Arxiv

0+阅读 · 2023年2月27日

Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms

Arxiv

0+阅读 · 2023年2月23日

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Arxiv

0+阅读 · 2023年2月23日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

GMAW-P熔池表面动态行为和熔透实时控制机理及策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

柔性工序选择的混合流水车间调度及其离散群智能算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

铣削刀具磨损自适应建模与参数辨识策略研究

国家自然科学基金

1+阅读 · 2014年12月31日

飞秒时间尺度研究氟利昂在紫外辐射下的解离动力学

国家自然科学基金

0+阅读 · 2013年12月31日

基于强化学习的前列腺癌蛋白质间相互作用网络的模型及方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Caco-2细胞模型的蛋清肽结构与完整吸收关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

油菜BnICE1基因与MAP激酶信号途径在调控植物耐寒性中的相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于微泡浮选的多流态梯级强化油水分离研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员