控制受限制的《多哈发展方案》的直接-间接混合办法 (A Direct-Indirect Hybridization Approach to Control-Limited DDP)

Differential Dynamic Programming (DDP) is an indirect method for trajectory optimization. Its efficiency derives from the exploitation of temporal structure (inherent to optimal control problems) and explicit roll-out/integration of the system dynamics. However, it suffers from numerical instability and, when compared to direct methods, it has limited initialization options (allows initialization of controls, but not of states) and lacks proper handling of control constraints. These limitations are due to the fact that DDP is a single shooting algorithm. In this work, we tackle these issues with a direct-indirect hybridization approach that is primarily driven by the dynamic feasibility of the optimal control problem. Our feasibility search emulates the numerical resolution of a direct transcription problem with only dynamics constraints, namely a multiple shooting formulation. We show that our approach has better numerical convergence than BOX-DDP (a shooting method), and that its convergence rate and runtime performance are competitive with state-of-the-art direct transcription formulations solved using the interior point and active set algorithms available in KNITRO. We further show that our approach decreases the dynamic feasibility error monotonically -- as in state-of-the-art nonlinear programming algorithms. We demonstrate the benefits of our hybrid approach by generating complex and athletic motions for quadruped and humanoid robots.

翻译：不同动态编程(DDP)是优化轨迹的一种间接方法,其效率来自利用时间结构(即最优控制问题的内在能力)和系统动态的明确推出/整合。然而,它受数字不稳定的影响,与直接方法相比,它有有限的初始选项(允许控制初始化,而不是各州的初始化),缺乏适当的控制限制。这些限制是由于DDP是一种单一的射击算法。在这项工作中,我们以直接间接的间接混合方法处理这些问题,主要受最佳控制问题动态可行性的驱动。我们的可行性搜索效仿了直接抄录问题的数字解决方案,只有动态制约,即多重射击配方。我们表明,我们的方法在数字上比BOX-DDP(射击方法)更趋近,其趋近率和运行时间性表现与使用KNITRO的现有内点和积极设置算法解决的状态直接制式拼写法竞争。我们进一步表明,我们的方法减少了动态可行性错误的单调,正如我们为生成的硬质混合式机器人和硬质的硬质模型模型那样,我们用的是模拟的硬质的硬体模型模型模型模型模型模型模型和不图式的模型。