In this report we consider the following problem: Given a trained model that is partially faulty, can we correct its behaviour without having to train the model from scratch? In other words, can we ``debug" neural networks similar to how we address bugs in our mathematical models and standard computer code. We base our approach on the hypothesis that debugging can be treated as a two-task continual learning problem. In particular, we employ a modified version of a continual learning algorithm called Orthogonal Gradient Descent (OGD) to demonstrate, via two simple experiments on the MNIST dataset, that we can in-fact \textit{unlearn} the undesirable behaviour while retaining the general performance of the model, and we can additionally \textit{relearn} the appropriate behaviour, both without having to train the model from scratch.
翻译:在本报告中,我们考虑了以下问题:如果一个受过训练的模型有部分缺陷,我们能否纠正其行为而不必从零开始训练模型?换句话说,我们能否在数学模型和标准计算机代码中类似我们处理错误的方法“debug”神经网络。我们的方法依据的假设是,调试可以被视为一个双重任务的持续学习问题。特别是,我们采用了一个名为“Orthogonal Gradientle Groundle(OGD)”的不断学习算法(Orthognal Gradient Group (OGD)的修改版本,通过两个简单的关于MNIST数据集的实验来证明,我们可以在保留模型一般性能的同时,在实际操作\textit{unlearn}不受欢迎的行为,我们可以另外地将适当的行为视为一种“textit{rearn},两者都不必从零开始训练模型。