Continual Learning addresses the challenge of learning a number of different tasks sequentially. The goal of maintaining knowledge of earlier tasks without re-accessing them starkly conflicts with standard SGD training for artificial neural networks. An influential method to tackle this problem without storing old data are so-called regularisation approaches. They measure the importance of each parameter for solving a given task and subsequently protect important parameters from large changes. In the literature, three ways to measure parameter importance have been put forward and they have inspired a large body of follow-up work. Here, we present strong theoretical and empirical evidence that these three methods, Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI) and Memory Aware Synapses (MAS), are surprisingly similar and are all linked to the same theoretical quantity. Concretely, we show that, despite stemming from very different motivations, both SI and MAS approximate the square root of the Fisher Information, with the Fisher being the theoretically justified basis of EWC. Moreover, we show that for SI the relation to the Fisher -- and in fact its performance -- is due to a previously unknown bias. On top of uncovering unknown similarities and unifying regularisation approaches, we also demonstrate that our insights enable practical performance improvements for large batch training.
翻译:持续学习是按顺序学习一系列不同任务的挑战。 保持对早期任务的知识而不重新获得这些任务的目标,与人工神经网络的SGD标准培训有明显冲突。 一种不存储旧数据而解决这一问题的有影响力的方法是所谓的常规化方法。 它们衡量每个参数对于解决某项特定任务的重要性,随后保护重要的参数不受巨大变化的影响。 在文献中,提出了衡量参数重要性的三种方法,并启发了大量后续工作。 这里,我们提出了强有力的理论和经验证据,证明这三种方法,即 " 高级重量整合 " (EWC)、 " 合成智能智能 " (SI)和 " 记忆感知合成(MAS)是惊人的相似的,而且都与相同的理论数量相关。具体地说,我们表明,尽管各种动机不同,SI和MAS都接近渔业信息的正方根,而渔民在理论上是EWC的依据。 此外,我们对SI而言,与渔业的关系 -- 事实上,其绩效 -- 是由于先前未知的偏差。在发现未知的大规模相似之处和统一常规化做法方面,我们还证明,我们有能力进行大量的实际认识。