强化学习中的灾难性影响:基于上下文司和知识蒸馏的解决方案 (Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation) - 专知论文

会员服务 ·

0

Learning · 知识 (knowledge) · 蒸馏 · Extensibility · Networking ·

2022 年 9 月 1 日

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

翻译：强化学习中的灾难性影响:基于上下文司和知识蒸馏的解决方案

Tiantian Zhang,Xueqian Wang,Bin Liang,Bo Yuan

from arxiv, 21 pages

The powerful learning ability of deep neural networks enables reinforcement learning agents to learn competent control policies directly from continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general reinforcement learning paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance. In this paper, we present IQ, i.e., interference-aware deep Q-learning, to mitigate catastrophic interference in single-task deep reinforcement learning. Specifically, we resort to online clustering to achieve on-the-fly context division, together with a multi-head network and a knowledge distillation regularization term for preserving the policy of learned contexts. Built upon deep Q networks, IQ consistently boosts the stability and performance when compared to existing methods, verified with extensive experiments on classic control and Atari tasks. The code is publicly available at: https://github.com/Sweety-dm/Interference-aware-Deep-Q-learning.

翻译：深神经网络的强大学习能力使强化学习机构能够直接从连续的环境中学习适当的控制政策。理论上,为了实现稳定的性能,神经网络假定了i.d. 投入,不幸的是,这在一般强化学习范式中并不具备,因为培训数据在时间上是关联的和非静止的。这个问题可能导致“灾难性干扰”现象和性能的崩溃。在本文中,我们介绍IQ,即干扰意识深度Q-学习,以减轻单项任务深度强化学习中的灾难性干扰。具体地说,我们利用在线集群实现在飞行环境中的分化,同时采用多头网络和知识蒸馏正规化术语来维护学习环境的政策。在深Q网络中,IQ不断提高稳定性和性能,与现有方法相比,经过对传统控制和阿塔里任务的广泛实验验证。代码公布在http://github.com/Swety-dm/Interfer-aware-deep-Q-learlear。

0

相关内容

Learning

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

功能化离子液体膜材料设计及分离CO2的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于荧光共振能量转移机制的微流激光研究

国家自然科学基金

0+阅读 · 2013年12月31日

臂旁核谷氨酸能神经元调控睡眠-觉醒周期的作用及神经环路机制

国家自然科学基金

0+阅读 · 2013年12月31日

Persephin在急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

预测状态表示中状态空间划分机制及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

从内质网应激探讨针刺对脑缺血再灌注损伤保护作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

共振Schottky探针研制

国家自然科学基金

0+阅读 · 2012年12月31日

InGaN量子点的可控生长及其在GaN基激光器有源区中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

On effects of Knowledge Distillation on Transfer Learning

Arxiv

0+阅读 · 2022年10月18日

Deep Black-Box Reinforcement Learning with Movement Primitives

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

Bootstrapped Transformer for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

Arxiv

0+阅读 · 2022年10月15日

ISTA-Inspired Network for Image Super-Resolution

Arxiv

0+阅读 · 2022年10月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On effects of Knowledge Distillation on Transfer Learning

Arxiv

0+阅读 · 2022年10月18日

Deep Black-Box Reinforcement Learning with Movement Primitives

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

Bootstrapped Transformer for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

Arxiv

0+阅读 · 2022年10月15日

ISTA-Inspired Network for Image Super-Resolution

Arxiv

0+阅读 · 2022年10月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

相关基金

功能化离子液体膜材料设计及分离CO2的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于荧光共振能量转移机制的微流激光研究

国家自然科学基金

0+阅读 · 2013年12月31日

臂旁核谷氨酸能神经元调控睡眠-觉醒周期的作用及神经环路机制

国家自然科学基金

0+阅读 · 2013年12月31日

Persephin在急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

预测状态表示中状态空间划分机制及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

从内质网应激探讨针刺对脑缺血再灌注损伤保护作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

共振Schottky探针研制

国家自然科学基金

0+阅读 · 2012年12月31日

InGaN量子点的可控生长及其在GaN基激光器有源区中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员