When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.
翻译:当对低精确度计算神经网络进行培训时,四舍五入错误往往造成停滞,或不利于优化的趋同;在本文件中,我们研究了四舍五入对于满足Polyak-Lojasiewicz不平等的问题的梯度下降法趋同作用的影响;在这方面,我们表明,相反,偏差随机四舍五入错误可能是有益的,因为选择适当的四舍五入战略消除了渐变的梯度问题,迫使四舍五入的偏向向向下降。此外,我们获得了比不偏心的四舍五入法更严格的趋同率约束。在利用低精确度固定点数格式优化几个例子时,对各种四舍五入战略的绩效进行比较,从而验证了理论分析。