Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. New algorithms support bidirectional compressions, and also can be modified for stochastic setting with batches and for federated learning with partial participation of clients. We empirically validated our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.
翻译:变分不等式(variational inequalities)和鞍点问题(saddle point problems)在机器学习应用中越来越重要,包括对抗性学习、生成对抗网络(GANs)、变分推断、传输与鲁棒性优化等。随着训练高性能模型在各种应用中所需的数据量和问题规模增加,我们需要依赖并行和分布式计算。然而,在分布式训练中,计算节点之间的通信是训练中的瓶颈之一,尤其是在高维度和过参数化模型的情况下。由于这些考虑因素,为了在训练过程中减少传输信息的数量,同时获得与现有方法相当的质量,我们需要为现有方法提供策略。在本文中,我们首次提出了基于压缩通信的理论保证的分布式方法,用于解决变分不等式和鞍点问题:MASHA1和MASHA2。我们的理论和方法允许使用无偏(如Rand $ k $;MASHA1)和收缩(如Top $ k $;MASHA2)的压缩器。新算法支持双向压缩,还可以修改为带有批次的随机设置,并针对部分客户参与的联邦学习进行修改。我们通过两个实验设置对我们的结论进行了经验证实:标准双线性极小-极大问题和变压器的大规模分布式对抗训练。