了解后见目标重新标签要求重新思考差异最小化 (Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization) - 专知论文

会员服务 ·

0

散度 · Learning · 可理解性 · 经验回放 · MoDELS ·

2022 年 9 月 26 日

Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization

翻译：了解后见目标重新标签要求重新思考差异最小化

Lunjun Zhang,Bradly C. Stadie

Hindsight goal relabeling has become a foundational technique for multi-goal reinforcement learning (RL). The idea is quite simple: any arbitrary trajectory can be seen as an expert demonstration for reaching the trajectory's end state. Intuitively, this procedure trains a goal-conditioned policy to imitate a sub-optimal expert. However, this connection between imitation and hindsight relabeling is not well understood. Modern imitation learning algorithms are described in the language of divergence minimization, and yet it remains an open problem how to recast hindsight goal relabeling into that framework. In this work, we develop a unified objective for goal-reaching that explains such a connection, from which we can derive goal-conditioned supervised learning (GCSL) and the reward function in hindsight experience replay (HER) from first principles. Experimentally, we find that despite recent advances in goal-conditioned behaviour cloning (BC), multi-goal Q-learning can still outperform BC-like methods; moreover, a vanilla combination of both actually hurts model performance. Under our framework, we study when BC is expected to help, and empirically validate our findings. Our work further bridges goal-reaching and generative modeling, illustrating the nuances and new pathways of extending the success of generative models to RL.

翻译：后视目标的重新标签已成为多目标强化学习的基础技术(RL) 。想法很简单: 任何任意的轨迹都可以视为达到轨迹终点的专家演示。直观地说, 这个程序可以培养一种目标限制的政策来模仿亚最佳专家。但是, 模仿和后视重贴标签之间的这种联系并没有得到很好的理解。现代模仿学习算法用差异最小化的语言描述, 但它仍然是如何将后视目标重新贴入这个框架的一个公开问题。在这项工作中, 我们为实现目标制定了一个统一的目标, 从而解释这种联系, 我们可以从中获得有目标限制的监管学习(GCSL)和奖励功能, 从头等原则的后视经验重现(HER) 。我们实验地发现, 尽管最近在受目标限制的行为克隆(BC)方面有所进步, 多目标的Q学习仍然可以超越像BC一样的方法; 此外, 两种模式的业绩实际上都受到了伤害。在我们的框架下, 我们研究在BC工作过程中, 当我们的基因变异性研究过程中, 当我们将我们的基因变异性目标推延时, 当我们的工作和实验性研究时, 当我们的基因变近的桥梁, 我们的桥梁将我们的基因变近。

0

相关内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HuR/lincRNA152复合物在脓毒症免疫抑制中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

ATF诱导Th1/Th2漂移构建血管化人工胰岛的研究

国家自然科学基金

0+阅读 · 2014年12月31日

EVI1基因通过诱导表观遗传学改变导致骨髓增生异常综合征的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

一类不可微分布鲁棒最优控制问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

胃癌中NKD2基因的甲基化调控和信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

LNK基因影响JAK-STAT信号通路导致骨髓增殖性肿瘤发生的机理

国家自然科学基金

0+阅读 · 2012年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Arxiv

0+阅读 · 2022年11月2日

Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Arxiv

0+阅读 · 2022年11月2日

An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

Arxiv

0+阅读 · 2022年11月2日

Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks

Arxiv

0+阅读 · 2022年11月1日

Large Language Models and the Reverse Turing Test

Arxiv

0+阅读 · 2022年10月31日

A precise bare simulation approach to the minimization of some distances. Foundations

Arxiv

0+阅读 · 2022年10月29日

Pre-Trained Language Models for Interactive Decision-Making

Arxiv

0+阅读 · 2022年10月29日

Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training

Arxiv

0+阅读 · 2022年10月28日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《太空边缘（临近空间）的武器化？军事高空平台的进展与前景》

《利用星基增强系统（SBAS）信号进行射频干扰（RFI）检测与特征分析》

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

《军事领域特性及其对军事人工智能应用的影响》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Arxiv

0+阅读 · 2022年11月2日

Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Arxiv

0+阅读 · 2022年11月2日

An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

Arxiv

0+阅读 · 2022年11月2日

Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks

Arxiv

0+阅读 · 2022年11月1日

Large Language Models and the Reverse Turing Test

Arxiv

0+阅读 · 2022年10月31日

A precise bare simulation approach to the minimization of some distances. Foundations

Arxiv

0+阅读 · 2022年10月29日

Pre-Trained Language Models for Interactive Decision-Making

Arxiv

0+阅读 · 2022年10月29日

Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training

Arxiv

0+阅读 · 2022年10月28日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

相关基金

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HuR/lincRNA152复合物在脓毒症免疫抑制中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

ATF诱导Th1/Th2漂移构建血管化人工胰岛的研究

国家自然科学基金

0+阅读 · 2014年12月31日

EVI1基因通过诱导表观遗传学改变导致骨髓增生异常综合征的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

一类不可微分布鲁棒最优控制问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

胃癌中NKD2基因的甲基化调控和信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

LNK基因影响JAK-STAT信号通路导致骨髓增殖性肿瘤发生的机理

国家自然科学基金

0+阅读 · 2012年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员