Task-free continual learning (CL) aims to learn a non-stationary data stream without explicit task definitions and not forget previous knowledge. The widely adopted memory replay approach could gradually become less effective for long data streams, as the model may memorize the stored examples and overfit the memory buffer. Second, existing methods overlook the high uncertainty in the memory data distribution since there is a big gap between the memory data distribution and the distribution of all the previous data examples. To address these problems, for the first time, we propose a principled memory evolution framework to dynamically evolve the memory data distribution by making the memory buffer gradually harder to be memorized with distributionally robust optimization (DRO). We then derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). The proposed DRO is w.r.t the worst-case evolved memory data distribution, thus guarantees the model performance and learns significantly more robust features than existing memory-replay-based methods. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than existing task-free CL methods.
翻译:连续不断学习(CL)的目的是学习非静止数据流,而没有明确的任务定义,并且不忘记先前的知识。广泛采用的记忆回放方法可能会逐渐降低长数据流的效果,因为模型可能会对存储的示例进行混合,并过度使用记忆缓冲。第二,现有方法忽略了记忆数据分布中的高度不确定性,因为记忆数据分布和所有先前数据实例的分布之间存在巨大差距。为了解决这些问题,我们首次提议了一个原则性内存演变框架,以动态地发展记忆数据分布,使记忆缓冲逐渐更加难以与分布式强力优化(DRO)相协调。然后,我们得出了一套方法,以持续概率测量瓦列斯特梯度空间(WGF)来演化内存缓冲数据。提议的DRO是W.r.r.t最坏情况演变的记忆数据分布,从而保证模型的性能和学习比现有记忆回放法方法更强得多的特征。关于现有基准的广泛实验显示了拟议的减轻遗忘方法的有效性。作为拟议的框架的一个副产品,我们的方法比现有的对抗性任务法更加牢固。