灾难性遗忘问题新视角:迁移-干扰平衡

2019 年 7 月 6 日 CreateAMind



 

  

1. Catastrophic Forgetting and the Stability-Plasticity Dilemma

 

Building neural networks that can learn incrementally without forgetting is one of the existential challenges facing the current generation of deep learning solutions. Over the years, solutions to the continual learning problem have been largely driven by prominent conceptualizations of the issues faced by neural networks. One popular view is catastrophic forgetting (interference), in which the primary concern is the lack of stability in neural networks, and the main solution is to limit the extent of weight sharing across experiences by focusing on preserving past knowledge. Another popular and more complex conceptualization is the stability-plasticity dilemma. In this view, the primary concern is the balance between network stability (to preserve past knowledge) and plasticity (to rapidly learn the current experience), these techniques focus on balancing limited weight sharing with some mechanism to ensure fast learning. 

 

2. Transfer-interference trade-off

 

At an instant in time with parameters θ and loss L, we can define operational measures of transfer and interference between two arbitrary distinct examples (x i , y i ) and (x j , y j ) while training with SGD. Transfer occurs when:

 

This implies that learning example i will without repetition improve performance on example j and vice versa. Interference occurs when:

Here, in contrast, learning example i will lead to unlearning (i.e. forgetting) of example j and vice versa. There is weight sharing between i and j when they are learned using an overlapping set of parameters. So, potential for transfer is maximized when weight sharing is maximized while potential for interference is minimized when weight sharing is minimized.

 

The transfer-interference trade-off presents a novel perspective on the goal of gradient alignment for the continual learning problem.

A) The stability-plasticity dilemma considers plasticity with respect to the current learning and how it degrades old learning. The transfer-interference trade-off considers the stability-plasticity dilemma and its dependence on weight sharing in both forward and backward directions. This symmetric view is crucial as solutions that purely focus on reducing the degree of weight-sharing are unlikely to produce transfer in the future.

B) A depiction of transfer in weight space.

C) A depiction of interference in weight space.

 

The key difference in perspective with past conceptualizations of continual learning is that we are not just concerned with current transfer and interference with respect to past examples, but also with the dynamics of transfer and interference moving forward as we learn. This new view of the problem leads to a natural meta-learning perspective on continual learning: we would like to learn to modify our learning to affect the dynamics of transfer and interference in a general sense.

 

To the extent that our meta-learning into the future generalizes, this should make it easier for our model to perform continual learning in non-stationary settings. We achieve this by building off past work on experience replay algorithm that combines experience replay with optimization based meta-learning.

 

3. Meta-Experience Replay (MER)

 

In typical offline supervised learning, we can express our optimization objective over the stationary
distribution of x, y pairs within the dataset D:

where L is the loss function, which can be selected to fit the problem. If we would like to maximize transfer and minimize interference, we can imagine it would be useful to add an auxiliary loss to the objective to bias the learning process in that direction.

where (x i , y i ) and (x j , y j ) are randomly sampled unique data points. If we could maximize the dot products between gradients at these different points, it would directly encourage the network to share parameters where gradient directions align and keep parameters separate where interference is caused by gradients in opposite directions.

 

However, there are multiple problems that must be addressed to implement this kind of learning process in practice. The first problem is that continual learning deals with learning over a non-stationary stream of data. We address this by implementing an experience replay module that augments online learning so that we can approximately optimize over the stationary distribution of all examples seen so far.

Another practical problem is that the gradients of this loss depend on the second derivative of the loss function, which is expensive to compute. We address this by indirectly approximating the objective to a first order Taylor expansion using a meta-learning algorithm with minimal computational overhead.

 

In this work, we modify the Reptile algorithm ( a state-of-the-art meta-learning model created by OpenAI ) to properly integrate it with an experience replay module, facilitating continual learning while maximizing transfer and minimizing interference.

Reptile objective:

MER objective:

MER maintains an experience replay style memory M with reservoir sampling and at each time step draws s batches including k − 1 random samples from the buffer to be trained alongside the current example. Each of the k examples within each batch is treated as its own Reptile batch of size 1 with an inner loop Reptile meta-update after that batch is processed. We then apply the Reptile meta-update again in an outer loop across the s batches.


4. Evaluations

 

To test the efficacy of MER we compare it to relevant baselines for continual learning of many supervised tasks:

• Online: represents online learning performance of a model trained straightforwardly one example at a time on the incoming non-stationary training data by simply applying SGD.
• Independent: an independent predictor per task with less hidden units proportional to the number of tasks. When useful, it can be initialized by cloning the last predictor.
• Task Input: has the same architecture as Online, but with a dedicated input layer per task.
• EWC: Elastic Weight Consolidation (EWC) is an algorithm that modifies online learning where the loss is regularized to avoid catastrophic forgetting.
• GEM: Gradient Episodic Memory (GEM)  is an approach for making efficient use of episodic storage by following gradients on incoming examples to the maximum extent while altering them so that they do not interfere with past memories.

 

We considered the evaluation of MER in a continual reinforcement learning setting where the environment is highly non-stationary. Specifically, we used Catcher and Flappy Bird. In the case of Flappy Brid, MER was trained to navigate through pipes while making the pipe gap the bird needs to get through smaller and smaller gaps as the game progresses. This environment results particularly challenging to achieve continual reinforcement learning give the sudden changes to the dynamics of gameplay that really tests the agent’s ability to detect changes in the environment without supervision. 

In Catcher, we then obtain different tasks by incrementally increasing the pellet velocity a total of 5 times during training. In the experiments MER outperform standard DQN models as shown in the following figure:

 

Code available at https://github.com/mattriemer/mer.


 


登录查看更多
17

相关内容

专知会员服务
109+阅读 · 2020年6月26日
因果图,Causal Graphs,52页ppt
专知会员服务
238+阅读 · 2020年4月19日
深度强化学习策略梯度教程,53页ppt
专知会员服务
176+阅读 · 2020年2月1日
【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习
专知会员服务
37+阅读 · 2019年12月17日
强化学习最新教程,17页pdf
专知会员服务
168+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
99+阅读 · 2019年10月9日
强化学习三篇论文 避免遗忘等
CreateAMind
19+阅读 · 2019年5月24日
Transferring Knowledge across Learning Processes
CreateAMind
26+阅读 · 2019年5月18日
【紫冬新作】人脸识别新突破:真实场景下的大规模双样本学习方法
中国科学院自动化研究所
10+阅读 · 2019年3月7日
无监督元学习表示学习
CreateAMind
26+阅读 · 2019年1月4日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
【干货】机器学习中样本比例不平衡的处理方法
机器学习研究会
8+阅读 · 2018年1月14日
计算机视觉近一年进展综述
机器学习研究会
8+阅读 · 2017年11月25日
【推荐】卷积神经网络类间不平衡问题系统研究
机器学习研究会
6+阅读 · 2017年10月18日
Meta-Transfer Learning for Zero-Shot Super-Resolution
Arxiv
43+阅读 · 2020年2月27日
A Comprehensive Survey on Transfer Learning
Arxiv
117+阅读 · 2019年11月7日
Arxiv
5+阅读 · 2019年6月5日
Learning Blind Video Temporal Consistency
Arxiv
3+阅读 · 2018年8月1日
VIP会员
相关VIP内容
专知会员服务
109+阅读 · 2020年6月26日
因果图,Causal Graphs,52页ppt
专知会员服务
238+阅读 · 2020年4月19日
深度强化学习策略梯度教程,53页ppt
专知会员服务
176+阅读 · 2020年2月1日
【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习
专知会员服务
37+阅读 · 2019年12月17日
强化学习最新教程,17页pdf
专知会员服务
168+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
99+阅读 · 2019年10月9日
相关资讯
强化学习三篇论文 避免遗忘等
CreateAMind
19+阅读 · 2019年5月24日
Transferring Knowledge across Learning Processes
CreateAMind
26+阅读 · 2019年5月18日
【紫冬新作】人脸识别新突破:真实场景下的大规模双样本学习方法
中国科学院自动化研究所
10+阅读 · 2019年3月7日
无监督元学习表示学习
CreateAMind
26+阅读 · 2019年1月4日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
【干货】机器学习中样本比例不平衡的处理方法
机器学习研究会
8+阅读 · 2018年1月14日
计算机视觉近一年进展综述
机器学习研究会
8+阅读 · 2017年11月25日
【推荐】卷积神经网络类间不平衡问题系统研究
机器学习研究会
6+阅读 · 2017年10月18日
Top
微信扫码咨询专知VIP会员