In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.
翻译:在数据集压缩任务中,目标是通过合成一个小型数据集来复现原始大型数据集的训练效用。现有的压缩方法合成的数据集存在显著冗余,因此亟需减少冗余并提升合成数据集的多样性。为解决这一问题,我们提出了一种直观的多样性正则化器(DiRe),它由余弦相似度和欧氏距离构成,能够即插即用地应用于多种先进的数据集压缩方法。通过大量实验,我们证明在从CIFAR-10到ImageNet-1K的多个基准数据集上,加入我们的正则化器能够从泛化性能和多样性指标两方面提升现有先进压缩方法的效果。