亚当式的单方加固在非凝结非凝结最小最大最佳化中的方形算法 (On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization)

Adam-type methods, the extension of adaptive gradient methods, have shown great performance in the training of both supervised and unsupervised machine learning models. In particular, Adam-type optimizers have been widely used empirically as the default tool for training generative adversarial networks (GANs). On the theory side, however, despite the existence of theoretical results showing the efficiency of Adam-type methods in minimization problems, the reason of their wonderful performance still remains absent in GAN's training. In existing works, the fast convergence has long been considered as one of the most important reasons and multiple works have been proposed to give a theoretical guarantee of the convergence to a critical point of min-max optimization algorithms under certain assumptions. In this paper, we firstly argue empirically that in GAN's training, Adam does not converge to a critical point even upon successful training: Only the generator is converging while the discriminator's gradient norm remains high throughout the training. We name this one-sided convergence. Then we bridge the gap between experiments and theory by showing that Adam-type algorithms provably converge to a one-sided first order stationary points in min-max optimization problems under the one-sided MVI condition. We also empirically verify that such one-sided MVI condition is satisfied for standard GANs after trained over standard data sets. To the best of our knowledge, this is the very first result which provides an empirical observation and a strict theoretical guarantee on the one-sided convergence of Adam-type algorithms in min-max optimization.

翻译：亚当类型的方法,即适应性梯度方法的扩展,在受监管和不受监督的机器学习模式的培训中表现出了很高的绩效。特别是,亚当类型的优化器被广泛作为培训基因对抗网络(GANs)的默认工具。然而,在理论方面,尽管存在着理论结果,表明亚当类型的方法在尽量减少问题方面的效率,但在GAN的培训中仍然缺乏其出色绩效的原因。在现有工作中,快速趋同长期以来被认为是最重要的原因之一,并提出了多项工程,以在理论上保证在某些假设下将微压优化算法的临界点趋同到一个临界点。在本文中,我们首先从经验上认为,在GAN的培训中,亚当即使在成功培训时,也并没有达到一个临界点:只有发电机在趋同,而歧视者的梯度规范仍然很高。我们把这个片面趋同点命名为单面的趋同点。然后,我们通过显示亚当类型的算法在最严格的微缩缩缩缩缩缩缩缩缩缩缩略图的第一端点上,我们所训练的微缩略微缩微缩微缩微缩微缩微缩微缩微缩微缩缩缩缩缩缩缩缩图,在实验性压压模模模模模模模模模模模模的模模模模模的模型中,在一模模模模模模模模模模模的模模模模模模模模模的模模模上,在实验压压模模模的模模模模上,在对准状态下,对准的模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模模的模的模模模模模模模模模模模的模模模模模模模模模样上也下,对了。