Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler.
翻译:在深神经网络培训中,学习进度调度器一直是关键问题。 已经提出了多个调度器和方法,包括步态衰减调度器、适应性方法、焦线调度器和周期性调度器。 本文提出了一个新的排期方法,名为双曲分流衰减(HTD ) 。 我们在几个基准上进行了实验,例如:CIFAR-10和CIFAR-100数据集的ResNet、宽ResNet和DenseNet、PAMAP2数据集的LSTM、图像网络的ResNet和时装-MNIST数据集。 在我们的实验中,HTD几乎在所有情况下都比步态衰减和焦线排排表都快,同时比步衰减要求的超光度计要少,比线排程要灵活得多。