Motivated by neural network training in low-bit floating and fixed-point environments, this work studies the convergence of variants of SGD with computational error. Considering a general stochastic Lipschitz continuous loss function, a novel convergence result to a Clarke stationary point is presented assuming that only an approximation of its stochastic gradient can be computed as well as error in computing the SGD step itself. Different variants of SGD are then tested empirically in a variety of low-precision arithmetic environments, with improved test set accuracy achieved compared to SGD for two image recognition tasks.
翻译:在低位浮动和固定点环境中的神经网络培训的推动下,这项工作研究SGD变异物与计算错误的趋同。考虑到一般的随机性 Lipschitz 连续损失功能,提出了与克拉克固定点的新的趋同结果,假设只能计算其随机梯度的近似值以及计算SGD步骤本身的错误。然后,SGD的不同变异物在各种低精度算术环境中进行实验性测试,在两种图像识别任务中,与SGD相比,测试的精确度提高了。