Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.
翻译:多试剂随机游戏的无模型学习是一个活跃的研究领域。但是,现有的强化学习算法往往局限于零和游戏,并且只适用于小州行动空间或其他简化设置。在这里,我们开发了一种新的数据高效深层次学习方法,用于普通和随机游戏的Nash平衡模型学习。算法使用局部的线性二次游戏扩展,从而导致分析上可溶的最佳行动。扩展由深神经网络进行平衡,使其有足够的灵活性学习环境,而无需经历所有州行动对等。我们研究了来自标签不动类随机游戏的算法的对称性,并作为一种概念的证明,我们运用了我们的算法来学习竞争性电子市场的最佳交易战略。