One significant shortcoming of machine learning is the poor ability of models to solve new problems quicker and without forgetting acquired knowledge. To better understand this issue, continual learning has emerged to systematically investigate learning protocols where the model sequentially observes samples generated by a series of tasks. First, we propose an optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from an information-theoretic formulation of bounded rationality and show its connections to other continual learning methods. Second, based on this principle, we propose a neural network layer for continual learning, called Mixture-of-Variational-Experts (MoVE), that alleviates forgetting while enabling the beneficial transfer of knowledge to new tasks. Our experiments on variants of the MNIST and CIFAR10 datasets demonstrate the competitive performance of MoVE layers when compared to state-of-the-art approaches.
翻译:机器学习的一个重大缺陷是模型能力差,无法更快地解决新问题,又不忘已获得的知识。为了更好地了解这一问题,不断学习已经出现,以系统调查学习协议,模型按顺序观察一系列任务产生的样本。首先,我们提出了有利于在学习和遗忘之间取舍的最佳原则。我们从封闭合理性的信息理论配方中得出这一原则,并表明它与其他持续学习方法的联系。第二,根据这一原则,我们提议了一个神经网络层,称为混合变异专家(MOVE),用于持续学习,以缓解遗忘现象,同时使知识的有益转让能够用于新的任务。我们对MNIST和CIFAR10数据集的变体的实验表明MOVE层与其他最新方法相比的竞争性表现。