Learning multiple tasks sequentially requires neural networks to balance retaining knowledge, yet being flexible enough to adapt to new tasks. Regularizing network parameters is a common approach, but it rarely incorporates prior knowledge about task relationships, and limits information flow to future tasks only. We propose a Bayesian framework that treats the network's parameters as the state space of a nonlinear Gaussian model, unlocking two key capabilities: (1) A principled way to encode domain knowledge about task relationships, allowing, e.g., control over which layers should adapt between tasks. (2) A novel application of Bayesian smoothing, allowing task-specific models to also incorporate knowledge from models learned later. This does not require direct access to their data, which is crucial, e.g., for privacy-critical applications. These capabilities rely on efficient filtering and smoothing operations, for which we propose diagonal plus low-rank approximations of the precision matrix in the Laplace approximation (LR-LGF). Empirical results demonstrate the efficiency of LR-LGF and the benefits of the unlocked capabilities.
翻译:顺序学习多个任务要求神经网络在保持已有知识的同时,具备足够的灵活性以适应新任务。对网络参数进行正则化是一种常见方法,但该方法很少纳入关于任务关系的先验知识,且仅允许信息向未来任务单向流动。我们提出一个贝叶斯框架,将网络参数视为非线性高斯模型的状态空间,从而解锁两项关键能力:(1) 提供一种原则性方法来编码关于任务关系的领域知识,例如可控制哪些层应在任务间进行自适应调整。(2) 贝叶斯平滑的新颖应用,使得任务专用模型也能纳入后续学习模型的知识,且无需直接访问其数据——这对隐私敏感型应用至关重要。这些能力依赖于高效的滤波与平滑运算,为此我们提出在拉普拉斯近似中对精度矩阵采用对角加低秩近似方法(LR-LGF)。实证结果验证了LR-LGF的高效性及其所解锁能力的优势。