We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREAM empirically achieves state-of-the-art performance among model-free algorithms in popular benchmark games, and is even competitive with algorithms that do use a perfect simulator.
翻译:我们引入了“DREAM ” ( DREAM ), 这是一种深层强化学习算法,它发现在与多个代理商的不完善信息游戏中采用最佳策略。 形式上,DREAM在双玩零和游戏中与Nash 平衡相融合,在其他所有游戏中则与广形粗皮相关平衡相融合。 我们的主要创新是一种有效的算法,与其他基于遗憾的深层学习算法相比,它并不需要获得完美的游戏模拟器来取得良好的表现。 我们显示DREAM 在流行基准游戏中实现了无模型算法的最先进的表现,甚至与使用完美模拟器的算法具有竞争力。