This paper presents a method to reduce the computational complexity of including second-order dynamics sensitivity information into the Differential Dynamic Programming (DDP) trajectory optimization algorithm. A tensor-free approach to DDP is developed where all the necessary derivatives are computed with the same complexity as in the iterative Linear Quadratic Regulator~(iLQR). Compared to linearized models used in iLQR, DDP more accurately represents the dynamics locally, but it is not often used since the second-order derivatives of the dynamics are tensorial and expensive to compute. This work shows how to avoid the need for computing the derivative tensor by instead leveraging reverse-mode accumulation of derivative information to compute a key vector-tensor product directly. We benchmark this approach for trajectory optimization with multi-link manipulators and show that the benefits of DDP can often be included without sacrificing evaluation time, and can be done in fewer iterations than iLQR.
翻译:本文介绍了一种方法来降低将二阶动态敏感信息纳入差异动态编程轨迹优化算法的计算复杂性。 在以与迭接线性二次调控系统~(iLQQR)相同的复杂度计算所有必要的衍生物的情况下,开发了一种无压力的 DDP 方法。 与iLQR 中采用的线性模型相比, DDP 更准确地代表了当地动态, 但并不经常使用这种方法, 因为该动态的第二阶级衍生物是苛刻和昂贵的计算方法。 这项工作表明如何避免需要计算衍生物拉子, 而不是利用衍生物信息的逆向模式积累来直接计算一个关键矢量加速产物。 我们用多链接操纵器来设定这一轨迹优化方法的基准, 并表明DDP 的好处通常可以在不牺牲评价时间的情况下被包括在内, 并且可以在比 iLQR 更少的迭代法中完成。