Model-based deep learning has achieved astounding successes due in part to the availability of large-scale realworld data. However, processing such massive amounts of data comes at a considerable cost in terms of computations, storage, training and the search for good neural architectures. Dataset distillation has thus recently come to the fore. This paradigm involves distilling information from large real-world datasets into tiny and compact synthetic datasets such that processing the latter yields similar performances as the former. State-of-the-art methods primarily rely on learning the synthetic dataset by matching the gradients obtained during training between the real and synthetic data. However, these gradient-matching methods suffer from the accumulated trajectory error caused by the discrepancy between the distillation and subsequent evaluation. To alleviate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7% on a subset of images of the ImageNet dataset with higher resolution images. We also validate the effectiveness and generalizability of our method with datasets of different resolutions and demonstrate its applicability to neural architecture search.
翻译:以模型为基础的深层次学习取得了惊人的成功,部分原因是提供了大规模真实世界数据。然而,处理如此大量的数据在计算、储存、培训和寻找良好的神经结构方面成本巨大。因此,数据集蒸馏最近才浮出水面。这一范式涉及将大型真实世界数据集中的信息蒸馏成细小和紧凑的合成数据集,使后者的处理产生与前者相似的性能。最先进的方法主要依靠学习合成数据集,将培训期间获得的实际数据与合成数据之间的梯度相匹配。然而,这些梯度匹配方法在计算、储存、培训和寻找良好的神经结构方面成本巨大。为了减轻这一累积轨迹错误的不利影响,我们提出了一种新颖的办法,鼓励优化算法寻求一个平坦的轨迹。我们所培训的合成数据的权重与与固定轨迹的常积错误相匹配。我们称为“平坦点缓冲”的方法(FTD),其梯度匹配方法因蒸馏与随后的评估结果不一致而累积的轨误差差差差差差差差差差差错而受到影响。