Dataset distillation reduces the network training cost by synthesizing small and informative datasets from large-scale ones. Despite the success of the recent dataset distillation algorithms, three drawbacks still limit their wider application: i). the synthetic images perform poorly on large architectures; ii). they need to be re-optimized when the distillation ratio changes; iii). the limited diversity restricts the performance when the distillation ratio is large. In this paper, we propose a novel distillation scheme to \textbf{D}istill information of large train sets \textbf{i}nto generative \textbf{M}odels, named DiM. Specifically, DiM learns to use a generative model to store the information of the target dataset. During the distillation phase, we minimize the differences in logits predicted by a models pool between real and generated images. At the deployment stage, the generative model synthesizes various training samples from random noises on the fly. Due to the simple yet effective designs, the trained DiM can be directly applied to different distillation ratios and large architectures without extra cost. We validate the proposed DiM across 4 datasets and achieve state-of-the-art results on all of them. To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75.1\% with ResNet-18 and 72.6\% with ConvNet-3 on ten images per class of CIFAR-10. Besides, DiM outperforms previous methods with 10\% $\sim$ 22\% when images per class are 1 and 10 on the SVHN dataset.
翻译:数据蒸馏法通过合成大型数据蒸馏法的成功,降低了网络培训的成本。尽管最近的数据蒸馏算法成功,但三个缺点仍然限制其更广泛的应用范围:一. 合成图像在大型结构中表现不佳;二. 当蒸馏率变化时,需要重新优化数据蒸馏法;三. 当蒸馏率高时,多样性有限限制了网络培训的成本。在本文中,我们提议了一个创新的蒸馏办法,将大型火车机组的信息汇总到\ textbf{i} 网络打印。尽管最近的数据蒸馏算法成功,但三个缺点仍然限制其更广泛的应用范围:一. 合成图像在大型结构中表现不佳;三. 当蒸馏率变化时,我们需要在蒸馏率和生成的图像之间的模型库中预测的对日志差异最小化。 在部署阶段, 基因化模型从随机噪音中合成了各种培训样本。由于简单有效的设计,经过培训的DIM- 变精度 变精度 3- 变精度 drib{{M}M} odel, 具体地, 将DiM- 直接应用一个变精度模型来保存数据 10 。</s>