Trajectory optimization and model predictive control are essential techniques underpinning advanced robotic applications, ranging from autonomous driving to full-body humanoid control. State-of-the-art algorithms have focused on data-driven approaches that infer the system dynamics online and incorporate posterior uncertainty during planning and control. Despite their success, such approaches are still susceptible to catastrophic errors that may arise due to statistical learning biases, unmodeled disturbances or even directed adversarial attacks. In this paper, we tackle the problem of dynamics mismatch and propose a distributionally robust optimal control formulation that alternates between two relative-entropy trust region optimization problems. Our method finds the worst-case maximum-entropy Gaussian posterior over the dynamics parameters and the corresponding robust optimal policy. We show that our approach admits a closed-form backward-pass for a certain class of systems and demonstrate the resulting robustness on linear and nonlinear numerical examples.
翻译:轨迹优化和模型预测控制是支持先进机器人应用的基本技术,从自主驱动到全体人体控制,从自主驱动到全体人体控制。 最新算法侧重于数据驱动方法,这些方法在在线上推断系统动态,并在规划和控制期间纳入后方不确定性。尽管这些方法取得了成功,但仍然容易发生灾难性错误,而这种错误可能是由于统计学习偏差、非模型干扰或甚至定向对抗性攻击造成的。在本文件中,我们处理动态不匹配问题,并提出一种分布稳健的最佳控制配方,在两个相对的植物托盘信任区域优化问题之间进行交替。我们的方法发现最差的情况最大元素高斯后方的后方,而不是动态参数和相应的稳健的最佳政策。我们表明,我们的方法为某类系统提供了一种封闭式后方通道,并展示了由此在线性和非线性数字实例上的稳健性。