Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually performed in single-precision floating-point number format. The convergence of single-precision SGD is normally aligned with the theoretical results of real numbers since they exhibit negligible error. However, the numerical error increases when the computations are performed in low-precision number formats. This provides compelling reasons to study the SGD convergence adapted for low-precision computations. We present both deterministic and stochastic analysis of the SGD algorithm, obtaining bounds that show the effect of number format. Such bounds can provide guidelines as to how SGD convergence is affected when constraints render the possibility of performing high-precision computations remote.
翻译:深层学习模型正在主导几乎所有的人工智能任务,如视觉、文字和语音处理。 斯托克渐变源(SGD)是培训这些模型的主要工具,在这些模型中,计算通常以单精度浮动点数格式进行。 单精度SGD的趋同通常与实际数字的理论结果一致,因为它们显示出微不足道的错误。 然而,当计算以低精度数字格式进行时,数字错误会增加。 这为研究为低精度计算而调整的 SGD 趋同提供了令人信服的理由。 我们对SGD 算法进行了确定性和随机分析,获得了显示数字格式效果的界限。 这些界限可以提供指导方针,说明在限制进行高精度计算的可能性时,SGD趋同会如何受到影响。