Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
翻译:由于模拟的取样复杂程度很高,因此模拟对成功应用强化学习至关重要。然而,许多现实世界问题呈现出过于复杂的动态,这使得其全面模拟的计算速度缓慢。在本文中,我们展示了如何将许多物剂的大型网络系统分解成多个本地部件,这样我们就可以建立独立和平行运行的单独模拟器。为了监测不同地方部件相互影响,这些模拟器每个都配备了一个学习的模型,定期对真实轨迹进行训练。我们的经验结果表明,将模拟器在不同的进程中进行分配不仅能够在几个小时内培训大型多试剂系统,而且还有助于减轻同时学习的负面影响。