We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.
翻译:我们研究的是哪些数据集假设允许解决离线双玩者零和马尔科夫游戏。与离线单一代理Markov决策程序形成鲜明对比的是,我们显示单一战略集中假设不足以在离线双玩者零和马尔科夫游戏中学习纳什平衡(NE)战略。另一方面,我们提出一个新的假设,称为单方面集中,并设计一种在这种假设下可能有效的悲观型算法。此外,我们表明,单线集中假设对于学习NE战略是必要的。此外,我们的算法可以在两个广泛研究的设置(统一集中假设数据集和转基马尔科夫游戏)中不作任何修改地实现微缩式样本复杂性。我们的工作是了解离线多试剂强化学习的重要的第一步。