We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and bounded nonconvex-nonconcave objective function, access to any proposal distribution for the min-player's updates, and stochastic gradient oracle for the max-player, our algorithm converges to the aforementioned approximate local equilibrium in a number of iterations that does not depend on the dimension. The equilibrium point found by our algorithm depends on the proposal distribution, and when applying our algorithm to train GANs we choose the proposal distribution to be a distribution of stochastic gradients. We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse
翻译:我们研究了最近引入的微量最大优化框架的变式, 最大玩家不得不以贪婪的方式更新其参数, 直到它到达第一个阶定点。 我们的平衡定义取决于一个配置分布, 最小玩家用来选择更新其参数的方向。 我们显示, 鉴于一个平滑且有条理的、 连接的、 非convex- nonconcolcave 目标功能, 进入任何配置 用于 分钟玩家更新的建议分布, 以及最大玩家的随机梯度或骨骼, 我们的算法会汇集到上述一些不取决于维度的迭代中近乎当地平衡。 我们的算法所发现的平衡点取决于配置分布, 当我们使用算法来培训 GANs 时, 我们选择的配置是用于分配 stochacetic 梯度的分布。 我们用经验来评估我们关于挑战非convex- nonconcave 测试功能的算法, 以及 GAN 培训中产生的损失函数。 我们的算法会汇集于这些测试功能, 当用来训练 GANs 时, 以合成和现实世界的数据模式避免崩溃和崩溃。