In multi-agent reinforcement learning, independent learners are those that do not access the action selections of other learning agents in the system. This paper investigates the feasibility of using independent learners to find approximate equilibrium policies in non-episodic, discounted stochastic games. We define a property, here called the $\epsilon$-revision paths property, and prove that a class of games exhibiting symmetry among the players has this property for any $\epsilon \geq 0$. Building on this result, we present an independent learning algorithm that comes with high probability guarantees of approximate equilibrium in this class of games. This guarantee is made assuming symmetry alone, without additional assumptions such as a zero sum, team, or potential game structure.
翻译:在多试剂强化学习中,独立学习者是那些无法进入系统中其他学习机构的行动选择的学习者。本文调查了使用独立学习者在非刺激的、折扣的随机游戏中找到近似平衡政策的可行性。 我们定义了一种财产,这里称为$\psilon$-revision True 属性, 并证明在玩家中显示对称的游戏类别拥有这种属性, 任何$\epsilon\geq 0美元。 基于这一结果, 我们提出了一个独立的学习算法, 其极有可能保证这一类游戏的近似平衡。 这一保证仅假设对称, 没有额外的假设, 比如零和团队, 或潜在的游戏结构 。