Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.
翻译:持续学习 — — 在保持旧任务绩效的同时按顺序学习新任务 — — 仍然对人工神经网络特别具有挑战性。 令人惊讶的是,遗忘的数量并没有随着所学到的任务之间的差异而增加,但似乎在中间相似制度中最为糟糕。 在本文中,我们从理论上分析了合成师生框架和真实的数据设置,以解释我们称之为马斯洛的锤子假设的这一现象。我们的分析揭示了节点激活和节点再使用之间的权衡,这导致在中间政权中最糟糕的遗忘。我们利用这一理解重新解释大众算法干预对于这种交易的灾难性干扰,并确定了它们最有效的制度。