EF21: 一个新的、更简单、理论上更好、实际更快的错误反馈 (EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback)

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-$k$. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth nonconvex problems, beating the previous bound of $O(1/T^{2/3})$, which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning.

翻译：误差反馈(EF)也称为误差补偿,在使用合同通信压缩机制,如Top-k$,强化监督机学习模式的分散培训中,这是一个非常受欢迎的趋同稳定机制。最初,Seide等人(2014年)提出,作为超理,EF直到最近才抵制任何理论理解[Stich等人,2018年,Alistarh等人,2018年]。然而,所有现有的分析都(i)仅适用于单一节点设置,(ii) 依赖非常强而且往往不合理的假设,即全球梯度的透明性,或过度依赖的假设,无法进行优先检查,在实践中可能无法维持,或(iii) 通过引入更多的不带偏见的压缩器来规避这些问题,这增加了通信成本。在这项工作中,我们通过提议和分析新的EFF21机制来弥补所有这些缺陷,这种机制在实践上持续且大大地大大超出EFEF21。我们的理论分析仅依靠标准假设,在分布的混合数据设置中工作,并且导致更好和更有意义的直线性比率。特别是,在O1/3x(x) 快速地认为,在以往的汇率上,这是一种快速的计算。