Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler. Code is available at https://github.com/BIGBALLON/HTD.
翻译:在深神经网络培训中,学习进度调度器是一个关键问题,已经提出了若干时间表和方法,包括步骤衰减调度器、适应性方法、焦线调度器和周期性调度器。本文件提出了一个新的时间安排方法,称为双曲断裂衰变(HTD)。我们在几个基准上进行了实验,例如:CIFAR-10和CIFAR-100数据集的ResNet、宽ResNet和DenseNet、PAMAP2数据集的LSTM、图像网络的ResNet和时装-MNIST数据集。在我们的实验中,几乎在所有情况下,HTD都超过步骤衰变和焦线排程,同时要求比步衰变少超立度计,而且比线变灵活。代码见https://github.com/BIGBALON/HTD。