Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling technique for the descent step in compressed SGD, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition and for non-convex objectives under a strong growth condition. We also show through simulation examples that without this scaling, the algorithm can fail to converge. We present experimental results on deep neural networks for real-world datasets, and compare the performance of our proposed algorithm with previously proposed compressed SGD methods in literature, and demonstrate improved performance on ResNet-18, ResNet-34 and DenseNet architectures for CIFAR-100 and CIFAR-10 datasets at various levels of compression.
翻译:现有压缩 SGD 算法假定使用非适应性的步进尺(固定或减少)来提供理论趋同保证。通常,职进尺在实际操作中经过微调,以适应数据集和学习算法,以提供良好的实证业绩。在许多学习情景中,这种微调可能不切实际,因此,有兴趣利用适应性继体规模研究压缩 SGD。根据以前关于SGD适应性步进规模方法的工作,在未压缩的环境下高效培训神经网络的适应性步进规模方法,我们为压缩 SGD 制定了适应性步进尺(固定或减少)方法。特别是,我们引入了压缩 SGD 的下降级算法,我们用这个方法来建立定序-最佳的趋同率,在内部条件下研究压缩的SGDD,因此有兴趣研究压缩的SGD。我们用适应性步进制方法来驱动SGD-Net的适应性规模方法,在深度增长的网络中,我们用SLAR-RO 数据模型来显示我们目前不断升级的变现变压的S-RO Ralalalal 。