Trajectory optimization and model predictive control are essential techniques underpinning advanced robotic applications, ranging from autonomous driving to full-body humanoid control. State-of-the-art algorithms have focused on data-driven approaches that infer the system dynamics online and incorporate posterior uncertainty during planning and control. Despite their success, such approaches are still susceptible to catastrophic errors that may arise due to statistical learning biases, unmodeled disturbances, or even directed adversarial attacks. In this paper, we tackle the problem of dynamics mismatch and propose a distributionally robust optimal control formulation that alternates between two relative entropy trust-region optimization problems. Our method finds the worst-case maximum entropy Gaussian posterior over the dynamics parameters and the corresponding robust policy. Furthermore, we show that our approach admits a closed-form backward-pass for a certain class of systems. Finally, we demonstrate the resulting robustness on linear and nonlinear numerical examples.
翻译:轨迹优化和模型预测控制是支持先进机器人应用的关键技术,从自主驱动到全体人体控制。 最新算法侧重于数据驱动方法,在规划和控制期间将系统动态在线推导并纳入后方不确定性。 尽管这些方法取得了成功,但仍然容易发生灾难性错误,而这种错误可能是由于统计学习偏差、非模型干扰、甚至定向对立攻击造成的。 在本文件中,我们处理动态不匹配问题,并提出一种分布稳健的最佳控制配方,在两个相对的对流信任区域优化问题之间进行交替。我们的方法发现,最坏的对流参数和相应的强势政策是最小的对流。此外,我们展示了我们的方法为某类系统提供了封闭式的后向通道。最后,我们展示了由此产生的线性和非线性数字实例的稳健性。