D-Shape:通过目标附加条件的示范-改善强化学习 (D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning) - 专知论文

会员服务 ·

0

Learning · 优化器 · 强化学习 · 样本 · 塑造 ·

2022 年 10 月 26 日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

翻译：D-Shape:通过目标附加条件的示范-改善强化学习

Caroline Wang,Garrett Warnell,Peter Stone

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward. If, however, suboptimal demonstrations are provided, a fundamental challenge appears in that the demonstration-matching objective of IL conflicts with the return-maximization objective of RL. This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict. D-Shape allows learning from suboptimal demonstrations while retaining the ability to find the optimal policy with respect to the task reward. We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy in the presence of suboptimal demonstrations.

翻译：虽然将模仿学习(IL)和强化学习(RL)相结合是解决自主行为获取中抽样效率低下问题的一个有希望的方法,但这样做的方法通常假定,必要的行为示范由一位在任务奖励方面表现最佳的专家提供。然而,如果提供了次优的演示,一个根本性的挑战就表现在:IL的示范匹配目标与RL的回归-最大化目标发生冲突。本文介绍了D-Shape,这是将IL和RL相结合的一种新方法,它利用从塑造奖励和有目标条件的RL获得的构想来解决上述冲突。D-Shape允许从次优的演示中学习,同时保留找到任务奖励方面最佳政策的能力。我们实验性地验证了在稀疏的网格世界域中的D-Shape,表明在抽样效率方面,它不仅高于RL,而且在存在次优的演示时与最佳政策一致。

0

相关内容

Learning

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

ClC-3氯通道蛋白在肾上腺素能受体介导心肌肥厚中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

变化环境下西江流域水文干旱特征及响应机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

三江平原阿布胶河流域氮素输出分异特征及溯源研究

国家自然科学基金

0+阅读 · 2014年12月31日

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

HEV感染致脑组织损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ba基复合钙钛矿陶瓷的有序/无序相变、畴结构与微波介电性能

国家自然科学基金

0+阅读 · 2012年12月31日

IL-6信号通路在猪脂肪细胞分化中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

活化的蛋白激酶C1受体在脂联素受体1介导的信号转导及能量代谢中作用

国家自然科学基金

0+阅读 · 2009年12月31日

Boosting Semi-Supervised Learning with Contrastive Complementary Labeling

Arxiv

1+阅读 · 2022年12月13日

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

Arxiv

0+阅读 · 2022年12月12日

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Arxiv

0+阅读 · 2022年12月11日

Skill-based Model-based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月11日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年12月11日

Reinforcement Learning for Predicting Traffic Accidents

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Boosting Semi-Supervised Learning with Contrastive Complementary Labeling

Arxiv

1+阅读 · 2022年12月13日

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

Arxiv

0+阅读 · 2022年12月12日

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Arxiv

0+阅读 · 2022年12月11日

Skill-based Model-based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月11日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年12月11日

Reinforcement Learning for Predicting Traffic Accidents

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

ClC-3氯通道蛋白在肾上腺素能受体介导心肌肥厚中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

变化环境下西江流域水文干旱特征及响应机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

三江平原阿布胶河流域氮素输出分异特征及溯源研究

国家自然科学基金

0+阅读 · 2014年12月31日

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

HEV感染致脑组织损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ba基复合钙钛矿陶瓷的有序/无序相变、畴结构与微波介电性能

国家自然科学基金

0+阅读 · 2012年12月31日

IL-6信号通路在猪脂肪细胞分化中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

活化的蛋白激酶C1受体在脂联素受体1介导的信号转导及能量代谢中作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员