Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overffiting and reduce performance when compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), which consists of a two-phases automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while the second phase uses low learning rate to slowly approach the center of the local minimum previously found. We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMSProp, and Adagrad. Our experiments showed that TASO outperformed all competing methods in both optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios.
翻译:深神经网络的适当优化是一个开放的研究问题,因为在整个培训过程中,改变学习率的最佳程序仍然未知。手工确定学习率时间表涉及麻烦的耗时尝试和错误程序,以确定超参数,如学习率衰减和学习率衰减率。虽然适应性学习率优化优化器使这一过程自动化,但最近的研究表明,与微调的学习率时间表相比,它们可能会产生超浮和降低性能。考虑到深神经网络损失功能呈现着比当地迷你马多得多的地形,我们提议了培训认知性成像仪(TASO),它由两阶段自动学习率时间表组成。第一阶段使用高学习率快速穿越许多马鞍点,而第二阶段则使用低学习率缓慢接近先前发现的地方最低值中心。我们比较了拟议的方法与通常使用的适应性学习率时间表,如亚当、RMSProp和阿德格。我们的实验显示,TASOO在最佳(i.e.)和亚多功能模型(使用超能力)和亚多功能假设情景中,所有相互竞争的方法。