We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation allows for more efficient and effective distillation. Concretely, we learn a set of bases (aka "memories") which are shared between classes and combined through learned flexible addressing functions to generate a diverse set of training examples. This leads to several benefits: 1) the size of compressed data does not necessarily grow linearly with the number of classes; 2) an overall higher compression rate with more effective distillation is achieved; and 3) more generalized queries are allowed beyond recalling the original classes. We demonstrate state-of-the-art results on the dataset distillation task across five benchmarks, including up to 16.5% and 9.7% in retained accuracy improvement when distilling CIFAR10 and CIFAR100 respectively. We then leverage our framework to perform continual learning, achieving state-of-the-art results on four benchmarks, with 23.2% accuracy improvement on MANY.
翻译:我们建议一种算法,将大型数据集的关键信息压缩到紧凑的可处理的记忆中。然后,可以回顾这些记忆,以快速重新训练神经网络,并恢复性能(而不是储存和再培训完整的原始数据集 ) 。基于数据集蒸馏框架,我们提出一个关键意见,即共享的共同代表可以提高蒸馏效率,更有成效的蒸馏。具体地说,我们学习了一组基础(“模拟”),这些基础在各班之间共享,并通过学习的灵活处理功能相结合,产生一套多样的培训范例。这导致若干好处:1)压缩数据的规模不一定随着班级数量的增加而线性增长;2)全面提高压缩率,并更有效地蒸馏;3)除了回顾最初的类别外,还允许进行更普遍的查询。我们展示了五大基准中数据集蒸馏任务的最新结果,包括分别提炼CFAR10和CIFAR100的精确度提高至16.5%和9.7%。我们随后利用我们的框架进行持续学习,在4个基准上达到MAL2的精确度上达到23。