We present a method for feedback motion planning of systems with unknown dynamics which provides probabilistic guarantees on safety, reachability, and goal stability. To find a domain in which a learned control-affine approximation of the true dynamics can be trusted, we estimate the Lipschitz constant of the difference between the true and learned dynamics, and ensure the estimate is valid with a given probability. Provided the system has at least as many controls as states, we also derive existence conditions for a one-step feedback law which can keep the real system within a small bound of a nominal trajectory planned with the learned dynamics. Our method imposes the feedback law existence as a constraint in a sampling-based planner, which returns a feedback policy around a nominal plan ensuring that, if the Lipschitz constant estimate is valid, the true system is safe during plan execution, reaches the goal, and is ultimately invariant in a small set about the goal. We demonstrate our approach by planning using learned models of a 6D quadrotor and a 7DOF Kuka arm. We show that a baseline which plans using the same learned dynamics without considering the error bound or the existence of the feedback law can fail to stabilize around the plan and become unsafe.
翻译:我们提出了一个对动态不明的系统进行反馈运动规划的方法,该方法为安全、可达性和目标稳定性提供概率保障。为了找到一个能够信任真实动态的精明控制-快感近似的域,我们估计利普施茨对真实动态与所学动态之间的差异的常数,并确保估计数具有一定的概率。如果系统拥有至少与州一样多的控制,我们也为一步骤的反馈法提供了存在条件,使真正的系统保持在与所学动态相规划的微小微轨迹范围内。我们的方法将反馈法作为基于抽样的规划师的一种制约,该方法将反馈法作为围绕名义计划的一种反馈政策,确保如果利普施茨的常数估计有效,真正的系统在计划执行期间是安全的,达到目标,最终在一小组目标上是没有变化的。我们用6D quadortoror 和 7DOF Kuka 手臂的学习模型来规划我们的方法证明了我们的方法。我们表明,在不考虑错误或反馈法的存在的情况下,使用同一学习的动态计划的基准可能无法稳定在计划周围的不安全。