Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across these and other applications, it is necessary to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. We empirically validate our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.
翻译:在机器学习应用方面,一般的不平等,尤其是搭桥点问题,越来越与机器学习应用相关,包括对抗性学习、GANs、交通和稳健优化。随着为在这些和其他应用中培养高性能模型而需要的数据和问题规模不断增加,有必要依赖平行和分布式计算。然而,在分布式培训中,计算节点之间的沟通是培训过程中的一个关键瓶颈,而高度和超分化模型的模型则加剧了这一问题。由于这些考虑,有必要为现有方法配备战略,以便能够减少培训期间传递的信息量,同时获得同等质量的模型。在本文件中,我们介绍了第一个基于理论的分布方法,用以利用压缩通信解决变异性不平等和尖点问题:MASHA1和MASA2。我们的理论和方法既允许使用不带偏见(如Randk$;MASHA1),又允许使用合同性(如顶值美元;MASHA2)的模型。由于这些考虑,我们必须用两种实验性结构来验证我们的结论:标准双线式微轴问题和大规模分布式的顶式变压器培训。