In this short note, we give the convergence analysis of the policy in the recent famous policy mirror descent (PMD). We mainly consider the unregularized setting following [11] with generalized Bregman divergence. The difference is that we directly give the convergence rates of policy under generalized Bregman divergence. Our results are inspired by the convergence of value function in previous works and are an extension study of policy mirror descent. Though some results have already appeared in previous work, we further discover a large body of Bregman divergences could give finite-step convergence to an optimal policy, such as the classical Euclidean distance.
翻译:在这个简短的注释中,我们用最近的著名政策镜像下降(PMD)来分析该政策的趋同性分析。我们主要考虑到布列格曼普遍分歧[11]之后的不正规环境。不同的是,我们直接将政策的趋同率置于布列格曼普遍分歧之下。我们的结果来自以往作品的价值功能的趋同性,是对政策镜影下降的延伸研究。虽然在以往的工作中已经出现过一些结果,但我们进一步发现,大量布列格曼差异可以使像古典的欧克莱德距离这样的最佳政策有一定的步趋同性。