双发电机离线强化学习 (Dual Generator Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 判别器 · 极大 · 总回报 · 约束 ·

2022 年 11 月 2 日

Dual Generator Offline Reinforcement Learning

翻译：双发电机离线强化学习

Quan Vuong,Aviral Kumar,Sergey Levine,Yevgen Chebotar

from arxiv, NeurIPS 2022

In offline RL, constraining the learned policy to remain close to the data is essential to prevent the policy from outputting out-of-distribution (OOD) actions with erroneously overestimated values. In principle, generative adversarial networks (GAN) can provide an elegant solution to do so, with the discriminator directly providing a probability that quantifies distributional shift. However, in practice, GAN-based offline RL methods have not performed as well as alternative approaches, perhaps because the generator is trained to both fool the discriminator and maximize return -- two objectives that can be at odds with each other. In this paper, we show that the issue of conflicting objectives can be resolved by training two generators: one that maximizes return, with the other capturing the ``remainder'' of the data distribution in the offline dataset, such that the mixture of the two is close to the behavior policy. We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint, where the policy does not need to match the entire data distribution, but only the slice of the data that leads to high long term performance. We name our method DASCO, for Dual-Generator Adversarial Support Constrained Offline RL. On benchmark tasks that require learning from sub-optimal data, DASCO significantly outperforms prior methods that enforce distribution constraint.

翻译：在离线的RL中,限制学习到的政策与数据保持距离对于防止该政策以错误高估的数值输出分配外(OOOD)行动至关重要。原则上,基因对抗网络(GAN)可以提供优雅的解决方案,因为歧视者直接提供了量化分布变化的概率。但在实践上,基于GAN的离线RL方法没有同时发挥其他方法的作用,也许因为发电机既训练既能愚弄歧视者,又能最大限度地返回 -- -- 两个目标可能相互对立。在本文件中,我们表明对矛盾的目标问题可以通过培训两个生成者来解决:一个是最大限度地回报的,另一个是在离线数据集中捕捉数据分布的“Remainder”的“Remander”,另一个是接近行为政策的组合。我们表明,不仅有两个发电机能够使基于GAN的离线的离线式RL方法产生有效的GAN,而且还可以比较一种支持性制约,即政策不需要与整个数据分布相匹配,而只是从前SCO基准方法的缩略取的DSARCG。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于电压补偿和负荷供需平衡的风光储微网优化控制策略

国家自然科学基金

0+阅读 · 2015年12月31日

陶瓷电极催化活性结构的原位构筑与调制及电解CO2机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于不确定规划理论的无线网络视频流传输研究

国家自然科学基金

0+阅读 · 2013年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能电网环境的电力系统安全约束动态经济调度方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

PI3K/Akt信号通路抑制免疫炎症反应对子痫前期的保护作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低压电子束激发场发射显示器用C12A7基导电荧光粉阴极射线发光增强机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多腔钢管混凝土异形截面巨型柱框架结构抗震机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

Generalised agent for solving higher board states of tic tac toe using Reinforcement Learning

Arxiv

0+阅读 · 2022年12月23日

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Arxiv

0+阅读 · 2022年12月23日

Land Cover and Land Use Detection using Semi-Supervised Learning

Land Cover and Land Use Detection using Semi-Supervised Learning

Arxiv

0+阅读 · 2022年12月21日

Lifelong Reinforcement Learning with Modulating Masks

Arxiv

0+阅读 · 2022年12月21日

Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization

Arxiv

0+阅读 · 2022年12月21日

Variational Inference for Model-Free and Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月18日

Large-Scale Retrieval for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月17日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters

Arxiv

0+阅读 · 2022年9月8日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Generalised agent for solving higher board states of tic tac toe using Reinforcement Learning

Arxiv

0+阅读 · 2022年12月23日

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Arxiv

0+阅读 · 2022年12月23日

Land Cover and Land Use Detection using Semi-Supervised Learning

Land Cover and Land Use Detection using Semi-Supervised Learning

Arxiv

0+阅读 · 2022年12月21日

Lifelong Reinforcement Learning with Modulating Masks

Arxiv

0+阅读 · 2022年12月21日

Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization

Arxiv

0+阅读 · 2022年12月21日

Variational Inference for Model-Free and Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月18日

Large-Scale Retrieval for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月17日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters

Arxiv

0+阅读 · 2022年9月8日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

基于电压补偿和负荷供需平衡的风光储微网优化控制策略

国家自然科学基金

0+阅读 · 2015年12月31日

陶瓷电极催化活性结构的原位构筑与调制及电解CO2机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于不确定规划理论的无线网络视频流传输研究

国家自然科学基金

0+阅读 · 2013年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能电网环境的电力系统安全约束动态经济调度方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

PI3K/Akt信号通路抑制免疫炎症反应对子痫前期的保护作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低压电子束激发场发射显示器用C12A7基导电荧光粉阴极射线发光增强机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多腔钢管混凝土异形截面巨型柱框架结构抗震机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员