Entropy regularization has been extensively adopted to improve the efficiency, the stability, and the convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively and qualitatively the impact of entropy regularization for Mean Field Game (MFG) with learning in a finite time horizon. Our study provides a theoretical justification that entropy regularization yields time-dependent policies and, furthermore, helps stabilizing and accelerating convergence to the game equilibrium. In addition, this study leads to a policy-gradient algorithm for exploration in MFG. Under this algorithm, agents are able to learn the optimal exploration scheduling, with stable and fast convergence to the game equilibrium.
翻译:内容正规化被广泛采用,目的是提高算法在强化学习中的效率、稳定性和趋同性;本文件从数量和质量上分析了中场游戏(MFG)加密正规化的影响,并在有限的时间范围内进行了学习;我们的研究从理论上证明,变本加厉的正规化产生依赖时间的政策,此外,还有助于稳定并加速与游戏平衡的趋同;此外,这项研究还导致在MFG中形成政策级的勘探算法。根据这种算法,代理商能够学习最佳的探索时间安排,与游戏平衡稳定、快速的趋同。