In this paper, we present a novel maximum entropy formulation of the Differential Dynamic Programming algorithm and derive two variants using unimodal and multimodal value functions parameterizations. By combining the maximum entropy Bellman equations with a particular approximation of the cost function, we are able to obtain a new formulation of Differential Dynamic Programming which is able to escape from local minima via exploration with a multimodal policy. To demonstrate the efficacy of the proposed algorithm, we provide experimental results using four systems on tasks that are represented by cost functions with multiple local minima and compare them against vanilla Differential Dynamic Programming. Furthermore, we discuss connections with previous work on the linearly solvable stochastic control framework and its extensions in relation to compositionality.
翻译:在本文中,我们展示了一种新颖的“差异动态编程算法”最大倍数配方,并用单式和多式价值函数参数来得出两种变体。通过将最大倍增的“Bellman”方程式与成本函数的特定近似值相结合,我们能够获得一种“差异动态编程”新配方,这种配方能够通过采用多式联运政策进行探索而摆脱当地迷你。为了证明拟议的算法的有效性,我们用四个系统来提供实验结果,用四个系统来说明由成本函数代表的具有多种本地迷你功能的任务,并将它们与香草“差异动态编程”比较。此外,我们讨论了与以前关于线性可溶性随机控制框架及其扩展与构成性的关系。