Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.
翻译:在强化学习中,广泛采用了成文法正规化的Markov决定程序,本文涉及对成文法问题最初的双重表述,标准一级方法因缺乏严格的凝固性和微妙性而缓慢趋同,为解决这一问题,我们首先采用了一种新的二次混杂的初成品配方,新配方的自然梯度增生具有全球趋同保证和指数趋同率,我们还提出了新的内插指标,以进一步大大加快趋同速度,提供了数字结果,以显示在多种情况下拟议方法的性能。