Computational cost of training state-of-the-art deep models in many learning problems is rapidly increasing due to more sophisticated models and larger datasets. A recent promising direction for reducing training cost is dataset condensation that aims to replace the original large training set with a significantly smaller learned synthetic set while preserving the original information. While training deep models on the small set of condensed images can be extremely fast, their synthesis remains computationally expensive due to the complex bi-level optimization and second-order derivative computation. In this work, we propose a simple yet effective method that synthesizes condensed images by matching feature distributions of the synthetic and original training images in many sampled embedding spaces. Our method significantly reduces the synthesis cost while achieving comparable or better performance. Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures and obtain a significant performance boost. We also show promising practical benefits of our method in continual learning and neural architecture search.
翻译:在许多学习问题中,培训最先进的深层模型的计算成本由于更先进的模型和更大的数据集而迅速增加。最近一个降低培训成本的有希望的方向是数据集浓缩,目的是用一个小得多的学习合成数据集取代原有的大型培训,同时保存原始信息。虽然对精密图像集的深层模型的培训速度可能非常快,但由于复杂的双级优化和二级衍生计算,其合成仍然计算费用昂贵。在这项工作中,我们提出了一个简单而有效的方法,通过匹配许多抽样嵌入空间合成和原始培训图像的特征分布来合成压缩图像。我们的方法大大降低了合成成本,同时实现了可比或更好的性能。由于效率,我们运用了方法将精密的神经结构应用到更现实和更大的数据集中,并获得了显著的性能提升。我们还展示了我们方法在持续学习和神经结构搜索方面的有希望的实际效益。