We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.
翻译:我们根据软轴参数化,研究成本的马可夫限制决策程序(CMDPs),其中,一个代理商的目的是在满足对预期总功用的限制的同时,最大限度地发挥成本的正规价值功能。我们的理论分析表明,通过利用对成本的正规化,其拉格朗加的双重功能是顺利的,拉格朗加的双重功能差距可以分解为原始最佳差距和约束性违反。此外,我们提议对成本的成本的CMDPs采用加速双白法。我们证明,我们的方法在最佳化差距和对成本的CMDPs的限制违反方面都达到了全球趋同率 $\ 全基平面=O ⁇ (1/T) 。我们还提供了关于带有单一限制的CMDPs线性趋同率的讨论。