While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.
翻译:虽然低精确度优化已被广泛用于加速深层学习,但低精确度取样基本上仍未探索,因此,在许多大规模假设情况下,取样根本是行不通的,尽管取样为神经网络的概括性和不确定性估计提供了显著的惠益。在本文件中,我们提供了第一个低精确度慢精确度定级朗埃文动态(SGLD)的研究,表明由于低精确度定级与完全精确度梯度累积器的结合,其成本可以大幅降低而不会牺牲性能,因为其处理系统噪音的内在能力。我们证明,低精确度SGLD与全面精确度梯度累积器的结合,比其在强凝固度环境下的SGD对应器受到的偏差差差影响要小。为了进一步启用低精确度梯度累积器,我们为SGLD开发了一个新的分级函数,以保持每个更新步骤的差异。我们证明,低精确度SGLD由于能取得与完全精确度SGLD相似的性能,只有8位数的深度学习任务。