In many machine learning problems, large-scale datasets have become the de-facto standard to train state-of-the-art deep networks at the price of heavy computation load. In this paper, we focus on condensing large training sets into significantly smaller synthetic sets which can be used to train deep neural networks from scratch with minimum drop in performance. Inspired from the recent training set synthesis methods, we propose Differentiable Siamese Augmentation that enables effective use of data augmentation to synthesize more informative synthetic images and thus achieves better performance when training networks with augmentations. Experiments on multiple image classification benchmarks demonstrate that the proposed method obtains substantial gains over the state-of-the-art, 7% improvements on CIFAR10 and CIFAR100 datasets. We show with only less than 1% data that our method achieves 99.6%, 94.9%, 88.5%, 71.5% relative performance on MNIST, FashionMNIST, SVHN, CIFAR10 respectively. We also explore the use of our method in continual learning and neural architecture search, and show promising results.
翻译:在许多机器学习问题中,大型数据集已成为以沉重的计算负荷价格培训最先进的深层网络的“实际标准 ” 。 在本文中,我们侧重于将大型培训组凝结成小得多的合成组,从零开始培训深神经网络,最低性能下降。根据最近培训组综合方法,我们提议了可区别的暹粒增量,以便能够有效利用数据增强来合成更丰富的合成图像,从而在培训网络加增时取得更好的业绩。多图像分类基准实验表明,拟议的方法在最新技术、CIFAR10和CIFAR100数据集方面有7%的改进。我们用不到1%的数据显示,我们的方法分别实现了99.6%、94.9%、88.5%、7.1.5%的MNIST、Fashian MNIST、SVHN、CIFAR10的相对绩效。我们还探索了在持续学习和神经结构搜索中采用的方法,并展示了有希望的结果。