In this paper, we present a distributed variant of adaptive stochastic gradient method for training deep neural networks in the parameter-server model. To reduce the communication cost among the workers and server, we incorporate two types of quantization schemes, i.e., gradient quantization and weight quantization, into the proposed distributed Adam. Besides, to reduce the bias introduced by quantization operations, we propose an error-feedback technique to compensate for the quantized gradient. Theoretically, in the stochastic nonconvex setting, we show that the distributed adaptive gradient method with gradient quantization and error-feedback converges to the first-order stationary point, and that the distributed adaptive gradient method with weight quantization and error-feedback converges to the point related to the quantized level under both the single-worker and multi-worker modes. At last, we apply the proposed distributed adaptive gradient methods to train deep neural networks. Experimental results demonstrate the efficacy of our methods.
翻译:在本文中,我们提出了一个用于在参数服务器模型中培训深神经网络的适应性随机梯度方法的分布式变体。为了降低工人和服务器之间的通信成本,我们将两种类型的量化办法,即梯度四分制和重量四分制纳入拟议的分布式亚当。此外,为了减少量化操作带来的偏差,我们提出了一种偏差反馈技术,以补偿四分化梯度。理论上,在随机非convex设置中,我们表明,分布式的具有梯度四分制和错误回溯的适应性梯度方法与第一级固定点汇合,而分布式的具有重度四分制和错误回溯的适应性梯度方法与单一工人模式和多工作模式下与四分化水平相关的点相融合。最后,我们采用了拟议的分布式梯度方法来培训深神经网络。实验结果显示了我们的方法的功效。