This paper considers decentralized optimization with application to machine learning on graphs. The growing size of neural network (NN) models has motivated prior works on decentralized stochastic gradient algorithms to incorporate communication compression. On the other hand, recent works have demonstrated the favorable convergence and generalization properties of overparameterized NNs. In this work, we present an empirical analysis on the performance of compressed decentralized stochastic gradient (DSG) algorithms with overparameterized NNs. Through simulations on an MPI network environment, we observe that the convergence rates of popular compressed DSG algorithms are robust to the size of NNs. Our findings suggest a gap between theories and practice of the compressed DSG algorithms in the existing literature.
翻译:本文考虑分散优化,应用图表上的机器学习。神经网络模型规模的不断扩大促使先前关于分散的随机梯度算法的工程纳入通信压缩。另一方面,最近的工程表明超分度的非军事网络具有有利的趋同和一般化特性。在这项工作中,我们对压缩分散式非军事网络梯度算法(DSG)与超分度非军事网络的运作情况进行了实证分析。通过对MPI网络环境的模拟,我们观察到流行的压缩的DSG算法的趋同率与非军事网络的大小相当。我们的调查结果显示,在现有文献中,压缩的DSG算法的理论和实践存在差距。