While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.
翻译:虽然提议了多种持续学习方法的收集,以防止灾难性的遗忘,但目前缺乏彻底调查这些方法在利用经常性神经网络处理连续数据方面的效力。在这里,我们对各种连续数据基准的既定CL方法进行首次全面评估。具体地说,我们阐明了在对区域NNS应用权重方法(如弹性重量整合)时出现的特殊性。与向前进网络提供食物相比,区域NNS反复重复重复使用一套共有的权重,并需要工作记忆来处理输入样本。我们表明,权重方法的性能并不直接受到处理序列长度的直接影响,而是受到高工作记忆要求的直接影响,这导致对稳定的需求增加,其代价是学习以后的任务的塑料性降低。我们通过研究线性RNNS,为这种解释提供了更多的理论论据。我们的研究表明,已经建立的CL方法可以成功地移植到经常案例,并且基于超网络的正规化方法超越了输入的权重度方法,从而成为CL的有希望的候选者,同时将CNNF转化为C的视野。