Offline reinforcement learning (offline RL) is an emerging field that has recently begun gaining attention across various application domains due to its ability to learn strategies from earlier collected datasets. Offline RL proved very successful, paving a path to solving previously intractable real-world problems, and we aim to generalize this paradigm to a multiplayer-game setting. To this end, we introduce a problem of offline equilibrium finding (OEF) and construct multiple types of datasets across a wide range of games using several established methods. To solve the OEF problem, we design a model-based framework that can directly apply any online equilibrium finding algorithm to the OEF setting while making minimal changes. The three most prominent contemporary online equilibrium finding algorithms are adapted to the context of OEF, creating three model-based variants: OEF-PSRO and OEF-CFR, which generalize the widely-used algorithms PSRO and Deep CFR to compute Nash equilibria (NEs), and OEF-JPSRO, which generalizes the JPSRO to calculate (Coarse) Correlated equilibria ((C)CEs). We also combine the behavior cloning policy with the model-based policy to further improve the performance and provide a theoretical guarantee of the solution quality. Extensive experimental results demonstrate the superiority of our approach over offline RL algorithms and the importance of using model-based methods for OEF problems. We hope our work will contribute to advancing research in large-scale equilibrium finding.
翻译:离线强化学习(离线 RL)是一个新兴领域,最近开始在各种应用领域引起关注,因为其有能力从先前收集的数据集中学习战略。离线RL证明非常成功,为解决以往棘手的现实世界问题铺平了一条道路,我们的目标是将这一范式推广到多玩游戏的设置。为此,我们引入了一个离线平衡发现(OEF)问题,并采用若干既定方法在各种游戏中构建多种类型的数据集。为了解决OEF问题,我们设计了一个基于模型的框架,该框架可以直接将任何在线平衡查找算法应用到OEF的设置中,同时作出最小的改变。三种最突出的当代在线平衡查找算法适应了OEF的范畴,创建了三种基于模型的变体:OEF-PSRO和OEF-CFR的变体。我们将广泛使用的算法的PSRO和深CFR用于计算纳什基的模型(NES)和基于OEFRO的O-JPSRO, 将JSRO的计算(C-C-C-COL)模型相关精度模型的算法用于不断推进的实验性政策解决方案,我们还将利用实验性分析方法,以进一步展示性分析方法,以提升我们的实验性方法来改进我们的实验性研究。