Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows the number of factors of each weight matrix to scale with the complexity of the task, while the IBP prior encourages sparse weight factor selection and factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
翻译:受过训练的神经网络往往在连续任务环境中遭遇灾难性的遗忘,在这些环境中没有来自先前任务的数据。最近提出了一系列方法,采用各种模式扩展战略,作为可能的解决办法。然而,确定扩大模型的多少留给执业者,并经常为简便而选择一个固定的时间表,而不论即将到来的任务有多复杂。相反,我们建议采用基于印度包菲特进程(IBP)的有原则的巴耶斯非对称方法,让数据决定扩大模型复杂性的程度。我们将此与神经网络的权重矩阵的因数相配。这种方法使得每个权重矩阵的因数与任务的复杂性相适应,而以前IBP则鼓励微的权重系数选择和因子再利用,促进任务之间的积极知识转移。我们展示了我们方法在一系列持续学习基准上的有效性,并分析了在整个培训中如何分配和再利用权重因素。