Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader's optimal liquidation problem.
翻译:平均场游戏(MFG)和平均场控问题(MFC)是研究Nash equilibria 或社会选择(MFC)的框架,在与一系列代理人的游戏中研究Nash equilibria 或社会选择(MFC) 。这些问题可以用来与数量有限的代理人比较竞争或合作游戏,并发现广泛的应用,特别是在经济学方面。近年来,MFG和MFC的学习问题引起了人们的兴趣,这既是计算解决办法的一种方法,也是模拟大批学员聚集到平衡的一种方法。特别令人感兴趣的是,代理商不了解模式,从而导致发展强化学习(RL)方法的模型。在审查了关于这个主题的文献之后,我们与RL为MFG和MFC提出了两种时间尺度的方法,这取决于统一的Q学习算法。这个方法的主要新颖之处是同时更新一个行动价值函数和分布方式,但采用不同的速度。根据两种学习率的比率,算法学会MFG或MFC的解决方法。我们用这个方法来说明一种MFC的方法来说明这个方法,我们把最大程度的效用运用于HAVDRIL的消耗期。