持续深层强化学习中丧失的可塑性</s> (Loss of Plasticity in Continual Deep Reinforcement Learning)

The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises -- e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying dimensions (e.g., similarity between games, number of games, number of frames per game), with some experiments spanning 50 days and 2 billion environment interactions. Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy -- Concatenated ReLUs (CReLUs) activation function -- and demonstrate its effectiveness in facilitating continual learning in a changing environment.

翻译：在一个复杂和不断变化的世界中, 持续学习的能力是必不可少的。在本文中, 我们以不同程度的非常态性来描述基于卡通价值的深强化学习( RL) 方法的行为。特别是, 我们证明深RL 代理商在通过Atari 2600游戏的序列循环时丧失了学习良好政策的能力。这种现象在先前的各种伪装下被间接提到 -- -- 例如, 塑料丧失、隐含的分辨不足、首要偏向和能力损失。我们仔细调查这一现象的规模,分析不同层面( 例如游戏、游戏数目、每场游戏框架数目之间的相似性)的若干实验中的重量、梯度和激活随时间变化, 有些实验长达50天, 有20亿个环境互动。我们的分析显示, 网络的激活足迹变得稀少, 导致梯度下降。我们调查了一个非常简单的减缓战略 -- Contacated ReLUs (CLUs) 激活功能 -- -- 并展示其在促进不断变化的环境中持续学习方面的有效性。</s>

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日