The employment of stochastic rounding schemes helps prevent stagnation of convergence, due to vanishing gradient effect when implementing the gradient descent method in low precision. Conventional stochastic rounding achieves zero bias by preserving small updates with probabilities proportional to their relative magnitudes. In this study, we propose a new stochastic rounding scheme that trades the zero bias property with a larger probability to preserve small gradients. Our method yields a constant rounding bias that, at each iteration, lies in a descent direction. For convex problems, we prove that the proposed rounding method has a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with 8-bit floating-point format.
翻译:使用随机四舍五入方案有助于防止趋同停滞,这是因为在采用低精度梯度下降法时,梯度效应会消失。常规随机四舍五入通过保留与相对规模成比例的概率成比例的微小更新实现零偏差。在本研究中,我们提出了一个新的随机四舍五入方案,将零偏差财产进行交易,以更大的概率保存小梯度。我们的方法产生了一个不断四舍五入的偏差,在每个迭代中,这种偏差都以下降为方向。对于曲线问题,我们证明拟议的四舍五入方法对梯度下降的趋同率有益处。我们验证了我们的理论分析,方法是在优化多位物流回归模型和训练8位浮点格式的简单神经网络时,比较各种四舍五入方案的性能。