在强化学习中接近差异质量多样性的梯度 (Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning) - 专知论文

会员服务 ·

0

Performer · 多样性 · 近似 · state-of-the-art · 学成 ·

2022 年 4 月 15 日

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

翻译：在强化学习中接近差异质量多样性的梯度

Bryon Tjanaka,Matthew C. Fontaine,Julian Togelius,Stefanos Nikolaidis

from arxiv, Published as a conference paper at the 2022 Genetic and Evolutionary Computation Conference (GECCO '22); Online article available at http://dqd-rl.github.io

Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl

翻译：一种做法是生成多样化的代理政策。然后,培训可以被视为一个质量多样性(QD)优化问题,我们在此寻找一系列在量化行为方面各不相同的性能政策。最近的工作表明,在精确的梯度存在时,不同的质量多样性(DQD)算法大大加快了QD优化。然而,代理政策通常假定环境是不可区别的。为了将DQD算法应用于培训代理政策,我们必须为业绩和行为估算梯度。我们建议了当前状态-艺术DQD算法的两个变种,即通过强化学习中常见的近似法计算梯度(RL)。我们评估了四种模拟移动任务的方法。一个变种在将QD和RL相结合时取得了与当前状态相近的结果,而另一个变种则在两种locomotion任务中具有可比较性。这些结果使人们深入了解了当前DQD算法在必须接近梯度的域中的局限性。源代码可在 https://grus/rusbica/ 源代码上查阅。

0

相关内容

Performer

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

丝裂原激活蛋白激酶磷酸酶调控Nrf2信号通路的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

去泛素化酶HAUSP调控UBE3C促进非小细胞肺癌上皮间质转化及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高温胁迫下拟南芥miR400及其靶基因PPRP的功能及作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻抽穗期新基因LHD4的图位克隆和功能鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

番茄转录因子GAMYB在非生物胁迫中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

蜡样芽孢杆菌Bacillus cereus 905锰超氧化物歧化酶（MnSOD）基因表达调控途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

原位制备量子点/石墨烯复合物及其生物传感研究

国家自然科学基金

0+阅读 · 2010年12月31日

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月6日

Hierarchical Reinforcement Learning under Mixed Observability

Arxiv

0+阅读 · 2022年6月5日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年6月4日

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年6月4日

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Arxiv

0+阅读 · 2022年6月2日

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Arxiv

0+阅读 · 2022年6月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

相关论文

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月6日

Hierarchical Reinforcement Learning under Mixed Observability

Arxiv

0+阅读 · 2022年6月5日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年6月4日

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年6月4日

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Arxiv

0+阅读 · 2022年6月2日

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Arxiv

0+阅读 · 2022年6月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

丝裂原激活蛋白激酶磷酸酶调控Nrf2信号通路的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

去泛素化酶HAUSP调控UBE3C促进非小细胞肺癌上皮间质转化及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高温胁迫下拟南芥miR400及其靶基因PPRP的功能及作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻抽穗期新基因LHD4的图位克隆和功能鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

番茄转录因子GAMYB在非生物胁迫中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

蜡样芽孢杆菌Bacillus cereus 905锰超氧化物歧化酶（MnSOD）基因表达调控途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

原位制备量子点/石墨烯复合物及其生物传感研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员