In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde \Theta(1)$ when the local datasets heterogeneity goes to zero.
翻译:在最近中央化的非中央化分布式学习和联合学习中,地方方法是减少通信时间的有希望的方法之一,然而,现有工作主要侧重于研究一级优化保证;另一方面,在非分散式优化文献中,对二级优化保证算法进行了广泛研究;在本文中,我们研究一种新的地方算法,称为“Bias-Variance 减少当地受扰性 SGD(BVR-L-PSGD)”,将现有的偏差差减少梯度估计器与参数渗透性调整相结合,以便在中央化非中央化分布式优化中找到二级最佳点;BVR-L-PSGD拥有与最著名的BVR-L-SGD中找到一级优化的通信复杂性几乎相同的二级优化。特别是,当当地数据集的高度多变性小于当地损失的平滑度时,通信的复杂性比非地方方法要好。在极端的情况下,当当地数据设置 Hecondigenti 时,对 $\ 全局性(1)=Theta$的通信复杂性方法。