When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.
翻译:在低精度计算中采用梯度下降法时,采用随机四舍五入办法有助于防止因渐变效应消失而导致的趋同停滞。无偏见的随机四舍五入通过保留与相对规模成比例的概率成比例的微小更新而产生零偏差偏差。本项研究为低精度计算中梯度下降法的停滞提供了理论解释。此外,我们提议了两种新的随机四舍五入办法,将零偏差属性与更大的可能性进行交换,以保存小梯度。我们的方法产生了一种持续的四舍五入偏差,平均而言,这种四舍五入取向一种下降方向。对于曲线问题,我们证明拟议的四舍四舍五入方法通常会对梯度下降的趋同率产生有益影响。我们验证了我们的理论分析,方法是在优化多位曲线回归模型和训练8位浮点格式的简单神经网络时比较各种四舍五入办法的性能。</s>