Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at https://github.com/loshchil/SGDR
翻译:部分热热重新启动在基于梯度的优化中也越来越受欢迎,以提高加速梯度计划的趋同率,以对付条件不完善的功能。在本文中,我们提议在训练深神经网络时,为随机梯度下降提供一个简单的热热再起技术,以随时改进其性能。我们从经验上研究其在CIFAR-10和CIFAR-100数据集方面的表现,在那里,我们显示了新的最新结果,分别为3.14%和16.21%。我们还展示了其在EEEG录音数据集和图像网络数据集缩版方面的优势。我们的源代码可在https://github.com/loshchil/SGDR查阅。