In this work, we study stochastic non-cooperative games, where only noisy black-box function evaluations are available to estimate the cost function for each player. Since each player's cost function depends on both its own decision variables and its rivals' decision variables, local information needs to be exchanged through a center/network in most existing work for seeking the Nash equilibrium. We propose a new stochastic distributed learning algorithm that does not require communications among players. The proposed algorithm uses simultaneous perturbation method to estimate the gradient of each cost function, and uses mirror descent method to search for the Nash equilibrium. We provide asymptotic analysis for the bias and variance of gradient estimates, and show the proposed algorithm converges to the Nash equilibrium in mean square for the class of strictly monotone games at a rate faster than the existing algorithms. The effectiveness of the proposed method is buttressed in a numerical experiment.
翻译:在这项工作中,我们研究的是随机不合作的游戏,在这种游戏中,只有吵闹的黑盒功能评估可以估计每个玩家的成本功能。由于每个玩家的成本功能取决于自己的决定变量和对手的决定变量,因此,在大多数现有工作中,需要通过中心/网络交流当地信息,以寻求纳什均衡。我们建议一种新的随机分布式学习算法,不需要玩家之间进行交流。提议的算法使用同步扰动法来估计每个成本函数的梯度,并使用反向下推法来寻找纳什平衡。我们提供了梯度估计偏差和差异的零点分析,并展示了拟议的算法在纯单调游戏类的平均正方形与纳什平衡的趋同,其速度比现有算法要快。提议的算法的有效性在数字实验中得到了支持。