We consider decentralized stochastic variational inequalities where the problem data is distributed across many participating devices (heterogeneous, or non-IID data setting). We propose a novel method - based on stochastic extra-gradient - where participating devices can communicate over arbitrary, possibly time-varying network topologies. This covers both the fully decentralized optimization setting and the centralized topologies commonly used in Federated Learning. Our method further supports multiple local updates on the workers for reducing the communication frequency between workers. We theoretically analyze the proposed scheme in the strongly monotone, monotone and non-monotone setting. As a special case, our method and analysis apply in particular to decentralized stochastic min-max problems which are being studied with increased interest in Deep Learning. For example, the training objective of Generative Adversarial Networks (GANs) are typically saddle point problems and the decentralized training of GANs has been reported to be extremely challenging. While SOTA techniques rely on either repeated gossip rounds or proximal updates, we alleviate both of these requirements. Experimental results for decentralized GAN demonstrate the effectiveness of our proposed algorithm.
翻译:我们考虑分散的随机差异性,因为问题数据分布在许多参与装置(异质或非IID数据设置)中。我们建议一种新颖的方法,基于随机异端的超梯度,参与装置可以任意交流,可能时分的网络地形,包括完全分散的优化设置和在联邦学习中常用的集中式地形。我们的方法进一步支持关于工人减少工人之间沟通频率的多重本地更新。我们从理论上分析强单质、单质和非单质的设置中的拟议方案。作为一个特例,我们的方法和分析特别适用于分散的随机微量问题,在深层学习中正在对此进行越来越多的兴趣研究。例如,Generational Adversarial网络的培训目标通常具有临界问题,据报告,对GAN的分散式培训极具挑战性。虽然SOTA技术依赖于重复的流言轮或准式更新,但我们减轻了这两种要求。对分散式GAN的实验结果显示了我们提议的算法的有效性。