We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the framework, we find that although the feature/label distillation is storage-efficient, its causal effect is not coherent with the end-to-end feature learning merit, which is however preserved by data replay. To this end, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%--9.06%).
翻译:我们提出一个因果框架来解释在升级学习中灾难性忘却的灾难性后果,然后得出一种与现有的反移植技术(如数据重放和特性/标签蒸馏法)相一致的新型蒸馏方法。我们首先提出将CIL置于框架之中,2 回答为什么忘记发生:新培训中丢失了旧数据的因果效应,然后3 解释现有技术如何减轻这种后果:它们带来因果关系。根据该框架,我们发现尽管特性/标签蒸馏是储存效率高的,但其因果效应与现有反移植技术(如数据重放和特性/标签蒸馏法)不相符,然而,数据重放保存保存了这种效果。为此,我们提议淡化旧数据与新数据之间的相互交错效应,这从根本上相当于数据重放的因果效应,但无需再玩存储成本。由于对因果效应的分析,我们可以进一步捕捉到数据流的增量调调调效应,消除这种效应有助于保留新数据效果所承受的旧效果,而其因果效应则由数据再放数据重现。为此,我们提议通过重放数据再放数据再放数据效果来将旧的C- IM IM IM II 3 改进旧测试中,从而减轻了C- IM IM IMLILLLLLA 3 改进了旧的大规模测试。