One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.
翻译:目前机器学习算法的一个显著弱点是模型在不忘以前获得的知识的情况下解决新问题的能力差。 持续学习范式已经形成,作为系统调查模式按顺序观察一系列任务产生的样本的设置的规程。 在这项工作中,我们对持续学习采取任务不可知的视角,并发展了分级信息理论最佳原则,有利于在学习和忘却之间取舍。 我们从巴伊西亚的角度得出这一原则,并显示其与以往不断学习方法的联系。 基于这一原则,我们提议了一个神经网络层,称为混合蒸汽探索层,通过网络创建一套信息处理路径来减轻人们的遗忘。 在这项工作中,我们用一套由制定政策管理的信息处理路径来进行。 有了一套多样化和专业化的参数,每个路径可以被视为一个独特的子网络,可以解决任务。 为了改进专家的配置,我们引入了多样性目标,我们用更多的持续变异性研究来评估。 我们的方法可以以任务识别模式运作, i.e. 我们的方法可以通过创建一套信息处理路径, 而不是将我们现有的通用的变异性学习方法 。