In this paper, we present a deep learning framework for solving large-scale multi-agent non-cooperative stochastic games using fictitious play. The Hamilton-Jacobi-Bellman (HJB) PDE associated with each agent is reformulated into a set of Forward-Backward Stochastic Differential Equations (FBSDEs) and solved via forward sampling on a suitably defined neural network architecture. Decision-making in multi-agent systems suffers from the curse of dimensionality and strategy degeneration as the number of agents and time horizon increase. We propose a novel Deep FBSDE controller framework which is shown to outperform the current state-of-the-art deep fictitious play algorithm on a high dimensional inter-bank lending/borrowing problem. More importantly, our approach mitigates the curse of many agents and reduces computational and memory complexity, allowing us to scale up to 1,000 agents in simulation, a scale which, to the best of our knowledge, represents a new state of the art. Finally, we showcase the framework's applicability in robotics on a belief-space autonomous racing problem.
翻译:在本文中,我们提出了一个用假游戏解决大型多试剂非合作性随机游戏的深层次学习框架。汉密尔顿-Jacobi-Bellman(HJB)与每个代理商相关的PDE(HJB)重组为一套前方-背面式蒸馏式差异方程式(FBSDEs),通过对适当界定的神经网络结构进行前方抽样解决。多试剂系统的决策受到维度的诅咒,随着代理商数量的增加和时间前景的扩大,战略的退化。我们建议了一个新型的深FBSDE控制器框架,这个框架在高维度银行间借贷/借款问题上超越了当前最先进的深层虚构游戏算法。更重要的是,我们的方法减轻了许多代理商的诅咒,减少了计算和记忆的复杂性,使我们能够在模拟中提升到1,000个代理商,根据我们的知识,这一规模代表了艺术的新状态。最后,我们展示了框架在机器人在信仰空间自主竞赛问题上对机器人的适用性。