CCLF: 样本有效强化学习的对比-环境-驱动学习框架 (CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · INFORMS · 可约的 · 样本 · 强化学习 ·

2022 年 5 月 3 日

CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning

翻译：CCLF: 样本有效强化学习的对比-环境-驱动学习框架

Chenyu Sun,Hangwei Qian,Chunyan Miao

from arxiv, Full paper with supplementary material, accepted by IJCAI 2022. Acknowledgements and affiliations are updated

In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically find that not all samples are equally important and hence simply injecting more augmented inputs may instead cause instability in Q-learning. In this paper, we approach this problem systematically by developing a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF), which can fully exploit sample importance and improve learning efficiency in a self-supervised manner. Facilitated by the proposed contrastive curiosity, CCLF is capable of prioritizing the experience replay, selecting the most informative augmented inputs, and more importantly regularizing the Q-function as well as the encoder to concentrate more on under-learned data. Moreover, it encourages the agent to explore with a curiosity-based reward. As a result, the agent can focus on more informative samples and learn representation invariances more efficiently, with significantly reduced augmented inputs. We apply CCLF to several base RL algorithms and evaluate on the DeepMind Control Suite, Atari, and MiniGrid benchmarks, where our approach demonstrates superior sample efficiency and learning performances compared with other state-of-the-art methods.

翻译：在强化学习(RL)中,直接从高层次观测中学习是具有挑战性的,因为最近通过从原始像素的编码变异性,展示了数据增强的数据,以纠正这一点。然而,我们从经验上发现,并非所有样本都同等重要,因此只是注入更多的投入,反而会造成Q学习的不稳定。在本文件中,我们系统地处理这一问题,方法是开发一个模型――不可知的对立-差异-差异-驱动学习框架(CCLF),该框架能够以自我监督的方式充分利用样本的重要性,提高学习效率。在拟议的对比性好奇心的帮助下,CCLF能够将经验重播、选择信息最丰富的投入以及更重要的是使Q功能和编码更加正规化,以便更多地集中于学习不足的数据。此外,它鼓励代理商以好奇性的奖励方式探索。结果是,该代理商可以侧重于信息性更强的样本,以更高效的方式学习差异性代表,而投入则大大减少。我们将CCLF应用于几个基础的 RL算法,并评估深晶控制套件、Atari、MiniG等测试方法展示了我们的高级性学习方法。

0

相关内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Periostin-avβ3-FAK-PI3K通路在褐藻糖胶抗乳腺癌转移中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

miR-143-3p和miR-195-5p低表达在结直肠癌肝转移中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用纳米孔（柱）阵列同时提高LED内、外量子效率的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sfrp5抑制BTPs活性在脂肪细胞分化中的作用及转录调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向高分辨率的电子断层三维重构算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

大地构造环境判别的玄武岩介电测量方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

e-learning中基于学业表情的情绪认知分析研究

国家自然科学基金

0+阅读 · 2009年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Arxiv

4+阅读 · 2022年6月17日

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection

Arxiv

0+阅读 · 2022年6月17日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

0+阅读 · 2022年6月17日

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Reinforcement Learning with Action-Free Pre-Training from Videos

Arxiv

0+阅读 · 2022年6月16日

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Arxiv

0+阅读 · 2022年6月16日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Arxiv

4+阅读 · 2022年6月17日

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection

Arxiv

0+阅读 · 2022年6月17日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

0+阅读 · 2022年6月17日

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Reinforcement Learning with Action-Free Pre-Training from Videos

Arxiv

0+阅读 · 2022年6月16日

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Arxiv

0+阅读 · 2022年6月16日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Periostin-avβ3-FAK-PI3K通路在褐藻糖胶抗乳腺癌转移中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

miR-143-3p和miR-195-5p低表达在结直肠癌肝转移中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用纳米孔（柱）阵列同时提高LED内、外量子效率的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sfrp5抑制BTPs活性在脂肪细胞分化中的作用及转录调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向高分辨率的电子断层三维重构算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

大地构造环境判别的玄武岩介电测量方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

e-learning中基于学业表情的情绪认知分析研究

国家自然科学基金

0+阅读 · 2009年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员