Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.
翻译:最近,LARS和LAMB的优化软件被提议用于使用大批量尺寸更快地培训神经网络。LAMB和LAMB分别为重球动力和亚当的最新规则增添了分层正常化,并成为著名基准和深层学习图书馆的流行对象。然而,如果不与标准优化软件进行公平比较,LAMB和LAMB是否对传统的通用算法有任何好处仍是一个未决问题。 在这项工作中,我们证明Nesterov动力和Adam等标准优化算法可以匹配或超过LARS和LAMB的大批量尺寸结果。我们的结果为今后在这类批量尺寸上进行比较建立了新的、更强大的基线,并揭示了比较神经网络培训优化软件的困难。