深强化学习中强力探索自监督的序列信息瓶颈 (Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning) - 专知论文

会员服务 ·

0

INFORMS · Learning · Agent · 稳健性 · motivation ·

2022 年 9 月 12 日

Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

翻译：深强化学习中强力探索自监督的序列信息瓶颈

Bang You,Jingming Xie,Youping Chen,Jan Peters,Oleg Arenz

from arxiv, 14 pages

Effective exploration is critical for reinforcement learning agents in environments with sparse rewards or high-dimensional state-action spaces. Recent works based on state-visitation counts, curiosity and entropy-maximization generate intrinsic reward signals to motivate the agent to visit novel states for exploration. However, the agent can get distracted by perturbations to sensor inputs that contain novel but task-irrelevant information, e.g. due to sensor noise or changing background. In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations by modelling and compressing sequential predictive information in time-series observations. For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty based on the learned representations. We derive a variational upper bound of our sequential information bottleneck objective for practical optimization and provide an information-theoretic interpretation of the derived upper bound. Our experiments on a set of challenging image-based simulated control tasks show that our method achieves better sample efficiency, and robustness to both white noise and natural video backgrounds compared to state-of-art methods based on curiosity, entropy maximization and information-gain.

翻译：有效的探索对于在缺乏奖励或高度状态行动空间的环境中加强学习动力至关重要。最近基于州访问计数、好奇心和增殖的工程产生了内在的奖赏信号,以激励该代理访问新探索国家。然而,该代理可能由于对含有新颖但与任务相关的信息的传感器输入的干扰而分心,这些输入含有新颖但与任务相关的信息,例如由于传感器噪音或背景的变化。在这项工作中,我们引入了顺序信息瓶颈目标,通过模拟和压缩时间序列观测中的时间序列预测信息来学习压缩和时间一致的表述。为了在噪音环境中进行有效的探索,我们进一步建立内在的奖赏,根据所学的表述来捕捉与任务相关的新事物。我们从顺序信息瓶颈目标中得出一个变式的上限,以实际优化为目的,并对衍生的上限提供信息理论解释。我们在一系列具有挑战性的图像模拟控制任务方面的实验表明,我们的方法取得了更好的样本效率,并且对白色噪音和自然视频背景的坚固度,而与基于好奇心轴、最大最大化和信息再分析的状态方法相比,我们的方法则实现了。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

空蚀对镍基Inconel600合金钝化膜电化学性能影响

国家自然科学基金

0+阅读 · 2013年12月31日

转染乳腺癌相关HPV DNA序列诱导正常乳腺上皮细胞恶性转化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性金属氧化物薄膜晶体管研究

国家自然科学基金

0+阅读 · 2013年12月31日

马铃薯腐烂茎线虫耐低温相关基因克隆与功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

低营养低剪切条件下厌氧颗粒污泥形成的群感机理及调控

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器网络中基于时间序列相关性的低能耗数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月21日

On Feature Learning in the Presence of Spurious Correlations

Arxiv

1+阅读 · 2022年10月20日

Hypernetworks in Meta-Reinforcement Learning

Arxiv

1+阅读 · 2022年10月20日

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

Arxiv

0+阅读 · 2022年10月19日

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月21日

On Feature Learning in the Presence of Spurious Correlations

Arxiv

1+阅读 · 2022年10月20日

Hypernetworks in Meta-Reinforcement Learning

Arxiv

1+阅读 · 2022年10月20日

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

Arxiv

0+阅读 · 2022年10月19日

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

空蚀对镍基Inconel600合金钝化膜电化学性能影响

国家自然科学基金

0+阅读 · 2013年12月31日

转染乳腺癌相关HPV DNA序列诱导正常乳腺上皮细胞恶性转化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性金属氧化物薄膜晶体管研究

国家自然科学基金

0+阅读 · 2013年12月31日

马铃薯腐烂茎线虫耐低温相关基因克隆与功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

低营养低剪切条件下厌氧颗粒污泥形成的群感机理及调控

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器网络中基于时间序列相关性的低能耗数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员