In many fields, the acquisition of advanced models depends on large datasets, making data storage and model training expensive. As a solution, dataset distillation can synthesize a small dataset that preserves most information of the original large dataset. The recently proposed dataset distillation method by matching network parameters has been proven effective for several datasets. However, the dimensions of network parameters are typically large. Furthermore, some parameters are difficult to match during the distillation process, degrading distillation performance. Based on this observation, this study proposes a novel dataset distillation method based on parameter pruning that solves the problem. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on three datasets show that the proposed method outperforms other state-of-the-art dataset distillation methods.
翻译:在许多领域,先进模型的获取取决于大型数据集,使数据储存和模型培训费用昂贵。作为一个解决方案,数据集蒸馏可以合成一个保存原始大型数据集大多数信息的小型数据集。最近提出的通过匹配网络参数进行数据集蒸馏的方法已证明对若干数据集有效。然而,网络参数的维度通常很大。此外,在蒸馏过程中,有些参数难以匹配,蒸馏性能降低。根据这一观察,本研究提出一种基于参数修剪的新的数据集蒸馏方法,以解决问题。拟议方法可以在蒸馏过程中通过修剪难以匹配的参数,合成更强的蒸馏数据集,改进蒸馏性能。三个数据集的实验结果表明,拟议的方法优于其他状态的工艺数据集蒸馏方法。</s>