We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The \emph{same} algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent can not observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.
翻译:我们提出了一个“加强学习”算法,以解决无限的地平线无症状中位场游戏(MFG)和平均场控(MFC)问题。我们的方法可以被描述为一个统一的两度平均场域学习:\emph{same}算法可以通过简单的调整两个学习参数的比例来学习MFG或MFC解决方案。算法在离散的时间和空间里,其中代理器不仅对环境采取行动,而且提供国家分布,以便考虑到问题的平均领域特点。重要的是,我们假定该代理商无法观察人口分布,需要以无模型的方式估计。无症状MFG和MFC问题也在连续的时间和空间中出现,并且与经典(无症状或静止的)MFG和MFC问题相比较。它们导致在线性赤道(LQ)案中提出明确的解决方案,而线性(LQ)是用作我们算法结果的基准。