This paper studies stochastic control problems regularized by the relative entropy, where the action space is the space of measures. This setting includes relaxed control problems, problems of finding Markovian controls with the control function replaced by an idealized infinitely wide neural network and can be extended to the search for causal optimal transport maps. By exploiting the Pontryagin optimality principle, we identify suitable metric space on which we construct gradient flow for the measure-valued control process along which the cost functional is guaranteed to decrease. It is shown that under appropriate conditions, this gradient flow has an invariant measure which is the optimal control for the regularized stochastic control problem. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast. Furthermore, the optimal measured valued control admits Bayesian interpretation which means that one can incorporate prior knowledge when solving stochastic control problem. This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used in the reinforcement learning community to solve control problems.
翻译:本文研究相对的 entropy 所规范的随机控制问题, 即动作空间是测量空间的空间。 这一设置包括宽松的控制问题, 找到Markovian 控制的问题, 其控制功能被一个理想化的无限宽度神经网络所取代, 并可以扩大到寻找因果最佳运输图。 通过利用 Pontryagin 最佳性原则, 我们确定适当的测量空间, 用以构建测量值控制过程的梯度流, 从而保证降低成本功能。 事实显示, 在适当条件下, 这种梯度流具有一种变化性措施, 这是常规切换控制问题的最佳控制方法。 如果我们处理的问题有足够的convex, 梯度流会迅速汇合。 此外, 最佳的测量值控制会接受巴耶斯人的解释, 这意味着在解决随机控制问题时, 可以包含先前的知识。 这项工作的动机是希望扩大理论基础, 以便整合在强化学习社区中广泛使用的随机梯度梯度型算法, 以解决控制问题。