政策分解: 近似最佳控制,加上低于最佳估计值 (Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates)

Numerically computing global policies to optimal control problems for complex dynamical systems is mostly intractable. In consequence, a number of approximation methods have been developed. However, none of the current methods can quantify by how much the resulting control underperforms the elusive globally optimal solution. Here we propose policy decomposition, an approximation method with explicit suboptimality estimates. Our method decomposes the optimal control problem into lower-dimensional subproblems, whose optimal solutions are recombined to build a control policy for the entire system. Many such combinations exist, and we introduce the value error and its LQR and DDP estimates to predict the suboptimality of possible combinations and prioritize the ones that minimize it. Using a cart-pole, a 3-link balancing biped and N-link planar manipulators as example systems, we find that the estimates correctly identify the best combinations, yielding control policies in a fraction of the time it takes to compute the optimal control without a notable sacrifice in closed-loop performance. While more research will be needed to find ways of dealing with the combinatorics of policy decomposition, the results suggest this method could be an effective alternative for approximating optimal control in intractable systems.

翻译：模拟计算全球政策以优化对复杂动态系统的控制问题的最佳控制问题大多是难以解决的。因此,已经开发了一些近似方法。然而,目前的方法中没有一个能够量化,因为由此产生的控制对难以实现的全球最佳解决方案的不完善性能有多大影响。在这里,我们建议了政策分解,即一种具有明显亚最佳估计值的近似方法。我们的方法将最佳控制问题分解为低维次问题,最佳解决办法是重新结合为整个系统建立控制政策。许多这样的组合存在,我们引入了价值错误及其LQR和DDP估计值,以预测可能的组合的亚最佳性,并确定尽量减少这种组合的优先顺序。我们发现,使用一个马车极、三联平衡的双和N-链接平板操纵器作为示例系统。我们发现,这些估计正确确定了最佳组合,产生控制政策在相当小的时间里产生最佳控制政策,而无需在闭环性操作中作出显著牺牲。我们还需要进行更多的研究,以找到与政策最佳控制系统组合的组合的组合进行交易的方法,从而可以替代最佳控制系统。