规划对无限 -- -- 霍里松模型预测控制的价值 (The Value of Planning for Infinite-Horizon Model Predictive Control)

Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MPC. If the value function is accurate, it effectively allows MPC to reason over an infinite horizon. Unfortunately, Reinforcement Learning (RL) solutions to value function approximation can be difficult to realize for robotics tasks. In this paper, we suggest a more efficient method for value function approximation that applies to goal-directed problems, like reaching and navigation. In these problems, MPC is often formulated to track a path or trajectory returned by a planner. However, this strategy is brittle in that unexpected perturbations to the robot will require replanning, which can be costly at runtime. Instead, we show how the intermediate data structures used by modern planners can be interpreted as an approximate value function. We show that that this value function can be used by MPC directly, resulting in more efficient and resilient behavior at runtime.

翻译：模型预测控制(MPC)是优化控制复杂、现实世界系统的一个经典工具。尽管它已被成功地应用于机器人中一系列具有挑战性的任务,但它受到预测地平线的根本性限制,如果预测地平线太短,就会导致短视的决定。最近,一些论文建议使用一个有学识的价值函数作为MPC的终端成本。如果价值功能准确,它有效地允许MPC在无限的地平线上理解。不幸的是,对于机器人的任务来说,对价值函数接近值的强化学习(RL)解决方案可能难以实现。在本文中,我们建议一种适用于目标导向的问题(如到达和导航)的更高效的值函数近似方法。在这些问题上, MPC 常常被设计成跟踪计划者返回的路径或轨迹。但是, 这样的策略是模糊的, 因为对机器人的意外扰动需要重新规划, 而这在运行时成本会很高。相反, 我们展示现代规划者所使用的中间数据结构如何被解释为一种近似值函数。我们表明, MPC 能够直接使用这一价值函数, 从而产生更高效和更具弹性的行为。