This letter presents a method to reduce the computational demands of including second-order dynamics sensitivity information into the Differential Dynamic Programming (DDP) trajectory optimization algorithm. An approach to DDP is developed where all the necessary derivatives are computed with the same complexity as in the iterative Linear Quadratic Regulator (iLQR). Compared to linearized models used in iLQR, DDP more accurately represents the dynamics locally, but it is not often used since the second-order derivatives of the dynamics are tensorial and expensive to compute. This work shows how to avoid the need for computing the derivative tensor by instead leveraging reverse-mode accumulation of derivative information to compute a key vector-tensor product directly. We also show how the structure of the dynamics can be used to further accelerate these computations in rigid-body systems. Benchmarks of this approach for trajectory optimization with multi-link manipulators show that the benefits of DDP can often be included without sacrificing evaluation time, and can be done in fewer iterations than iLQR.
翻译:此字母为减少将二阶动态敏感信息纳入差异动态编程轨迹优化算法的计算需求提供了一种方法。 在以与迭接线性二次曲线调控器(iLQR)相同的复杂性计算所有必要衍生物的情况下,开发了DDP 方法。 与iLQR 所使用的线性模型相比, DDP 更准确地代表了本地的动态, 但并不经常使用这个方法, 因为动态的第二阶级衍生物是高压和昂贵的计算方法。 这项工作表明如何避免计算衍生源散点的需要, 代之以利用衍生信息反模式积累直接计算一个关键矢量加速器产品。 我们还展示了如何使用动态结构来进一步加速硬体系统中的这些计算。 多链接操纵器的轨迹优化基准显示, DDP 的效益往往可以在不牺牲评价时间的情况下包含在内, 并且可以在比 iLQR 更少的频率中完成。