One of the major limitations of deep learning models is that they face catastrophic forgetting in an incremental learning scenario. There have been several approaches proposed to tackle the problem of incremental learning. Most of these methods are based on knowledge distillation and do not adequately utilize the information provided by older task models, such as uncertainty estimation in predictions. The predictive uncertainty provides the distributional information can be applied to mitigate catastrophic forgetting in a deep learning framework. In the proposed work, we consider a Bayesian formulation to obtain the data and model uncertainties. We also incorporate self-attention framework to address the incremental learning problem. We define distillation losses in terms of aleatoric uncertainty and self-attention. In the proposed work, we investigate different ablation analyses on these losses. Furthermore, we are able to obtain better results in terms of accuracy on standard benchmarks.
翻译:深层次学习模式的主要局限性之一是,在逐步学习的情景中,它们面临灾难性的遗忘; 已经提出了几种解决渐进学习问题的办法; 这些方法大多以知识蒸馏为基础,没有充分利用老的任务模式提供的信息,例如预测中的不确定性估计; 预测性的不确定性提供了分配信息,可以在深层次学习框架内用于减轻灾难性的遗忘; 在拟议工作中,我们考虑采用巴耶斯语的提法,以获取数据和模型不确定性; 我们还采用自我注意框架,以解决递增学习问题; 我们从疏通性不确定性和自我注意的角度界定蒸馏损失; 在拟议工作中,我们调查对这些损失的不同消化分析; 此外,我们能够在标准基准的准确性方面获得更好的结果。