To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say L) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number L of local iterations. We use novel matrix concentration-based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than 60% of the communication rounds (between master and workers) and less than 40% of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training~loss.
翻译:为了在主工框架内解决分布优化中的通信瓶颈问题,我们建议使用分布式第二阶算法,即分布式第二阶算法,使用本地平均。在本地牛顿,工人机器通过寻找合适的第二阶下移方向,仅使用存储在本地记忆中的数据和模型,更新每个迭接模式的模型。我们让工人在当地运行多种这种迭接,并将模型传送给主节点一次。本地牛顿非常实用,因为它只需要一个超参数,即本地迭代数的L数。我们使用新的矩阵集中法为本地牛顿获取理论保障,我们用详细的实证评估来验证它们。为了提高实用性,我们设计了一个适应性方案来选择L。我们表明,随着培训的进行,这减少了两个模型之间的本地迭接合次数,连续地改进了主节点的模型质量。通过与AWS Lambda的工人和AWS EC2 硕士的几套真实世界数据集进行广泛的实验。我们用新的矩阵集中技术为本地牛顿公司获取理论保证,我们用详细的经验评估来验证它们的模式。为了提高实用性,我们用本地牛顿的40 %的运算算算,比国家运的40至比总算。