We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.
翻译:我们研究了基于梯度平均和适应性AMSGrad算法的分布式优化框架COMP-AMS。我们用有错误反馈的逐步压缩来降低梯度传输过程中的通信成本。我们对COMP-AMS的趋同分析表明,这种压缩梯度平均战略的趋同率与标准的AMSGrad相同,并展示了当地工人人数的线性加速效应。与最近提议的关于分布式适应方法的协议相比,COMP-AMS是简单和方便的。进行了数字实验,以证明理论结论的合理性,并表明拟议的方法可以实现与完全升级的AMSGrad相同的测试精度,并大量节省通信。由于简化和效率,COMP-AMS可以作为适应梯度方法的有用的分布式培训框架。