Adversarial training, a special case of multi-objective optimization, is an increasingly prevalent machine learning technique: some of its most notable applications include GAN-based generative modeling and self-play techniques in reinforcement learning which have been applied to complex games such as Go or Poker. In practice, a \emph{single} pair of networks is typically trained in order to find an approximate equilibrium of a highly nonconcave-nonconvex adversarial problem. However, while a classic result in game theory states such an equilibrium exists in concave-convex games, there is no analogous guarantee if the payoff is nonconcave-nonconvex. Our main contribution is to provide an approximate minimax theorem for a large class of games where the players pick neural networks including WGAN, StarCraft II, and Blotto Game. Our findings rely on the fact that despite being nonconcave-nonconvex with respect to the neural networks parameters, these games are concave-convex with respect to the actual models (e.g., functions or distributions) represented by these neural networks.
翻译:双向培训是多目标优化的一个特例,它是一个日益流行的机器学习技术:它的一些最显著的应用包括Go或Poker等复杂游戏应用的基于GAN的基因模型模型和自我游戏技术。在实践中,通常对一对网络进行培训,以找到高度非相近的非对立对立问题的近似平衡。然而,虽然游戏理论的经典结果显示,在康康康康康乐游戏中存在这种平衡,但是,如果报酬是非康康康非康康康文,则没有类似的保证。我们的主要贡献是为一大批游戏提供近似迷你马克斯的理论,在这些游戏中,玩家选择神经网络,包括WGAN、StarCraft II和Blottto游戏。我们的调查结果依据这样的事实:尽管在神经网络参数方面非conove-nonvex,但这些游戏是针对这些神经网络所代表的实际模型(e.g.功能或分布)的混凝调。