The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.
翻译:多试剂强化学习的实证成功是令人鼓舞的,尽管很少出现理论保障。 在这项工作中,我们证明插头解决方法(可能是最自然的强化学习算法)对于转基因随机游戏(TBSG)具有微量样本复杂性。 具体地说,我们计划使用“模拟器”在TBSG中进行实验性TBSG,通过“模拟器”从任意的州-州行为中进行取样。我们显示,经验型纳什平衡战略是真正的TBSG中一种近似纳什平衡的战略,它既能解决问题又能解决问题。我们开发吸收TBSG,并奖励渗透技术以解决复杂的统计依赖性。 关键思想是人为地在TBSG中引入一个亚优性差距,然后将纳什平衡战略放在一个有限的组合中。