Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. New algorithms support bidirectional compressions, and also can be modified for stochastic setting with batches and for federated learning with partial participation of clients. We empirically validated our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.
翻译:在机器学习应用,包括对抗性学习、全球网络网络、交通和稳健优化等方面,一般的不平等,尤其是搭配点问题问题越来越重要。随着为培训各种应用的高效模型而需要的数据和问题规模不断增加,我们需要依赖平行和分布式计算。然而,在分布式培训中,计算节点之间的沟通是培训过程中的一个关键瓶颈,高度和超分化模式的这一问题更加严重。由于这些考虑,有必要为现有方法配备战略,以减少培训期间传递的信息量,同时获得相当质量的模型。在本文中,我们介绍第一种基于理论的分布方法,以便利用压缩通信解决差异性不平等和搭配点问题:MASHA1和MASA2。我们的理论和方法允许使用不带偏见(如Randk$;MASA1)和合同性(如Top$k$;MASA2)的压缩器。新的算法支持双向压式压缩,还可以修改两组和双向式组合式的分布式的分布式实验性学习方式,并使用部分参与的大规模实验性客户。