Motivated by neural network training in low-bit floating and fixed-point environments, this work studies the convergence of variants of SGD with computational error. Considering a general stochastic Lipschitz continuous loss function, a novel convergence result to a Clarke stationary point is presented assuming that only an approximation of its stochastic gradient can be computed as well as error in computing the SGD step itself. Different variants of SGD are then tested empirically in a variety of low-precision arithmetic environments, where improved test set accuracy is observed compared to SGD for two image recognition tasks.
翻译:在低位浮动和固定点环境中的神经网络培训的推动下,这项工作研究SGD变异物与计算错误的趋同。考虑到一般的随机性 Lipschitz 连续损失功能,提出了与克拉克固定点的新的趋同结果,假设只能计算其随机梯度的近似值,而且计算SGD步骤本身也出错。然后,SGD的不同变异物在各种低精度算术环境中进行实验测试,在两种图像识别任务中,与SGD相比,检测的精确度有所提高。