A variety of control tasks such as inverse kinematics (IK), trajectory optimization (TO), and model predictive control (MPC) are commonly formulated as energy minimization problems. Numerical solutions to such problems are well-established. However, these are often too slow to be used directly in real-time applications. The alternative is to learn solution manifolds for control problems in an offline stage. Although this distillation process can be trivially formulated as a behavioral cloning (BC) problem in an imitation learning setting, our experiments highlight a number of significant shortcomings arising due to incompatible local minima, interpolation artifacts, and insufficient coverage of the state space. In this paper, we propose an alternative to BC that is efficient and numerically robust. We formulate the learning of solution manifolds as a minimization of the energy terms of a control objective integrated over the space of problems of interest. We minimize this energy integral with a novel method that combines Monte Carlo-inspired adaptive sampling strategies with the derivatives used to solve individual instances of the control task. We evaluate the performance of our formulation on a series of robotic control problems of increasing complexity, and we highlight its benefits through comparisons against traditional methods such as behavioral cloning and Dataset aggregation (Dagger).
翻译:反动运动学、轨迹优化(TO)和模型预测控制(MPC)等各种控制任务通常被设计成能源最小化问题。这些问题的量化解决办法已经确立,但往往过于缓慢,无法直接用于实时应用。替代办法是学习在离线阶段控制问题的多种解决办法。虽然这种蒸馏过程在模仿学习环境中可能微不足道地被设计成行为性克隆(BC)问题,但我们的实验突出了由于地方微型、内插工艺和州空间覆盖不足不相容而产生的若干重大缺陷。我们在本文件中提出了不列颠哥伦比亚的高效和数字稳健的替代方案。我们制定解决办法的方方面面,以最大限度地减少控制问题空间控制目标的能源术语。我们最大限度地减少这种能源的成分,同时采用一种新方法,将蒙特卡洛激励的适应性抽样战略与用于解决单个控制任务的衍生物结合起来。我们评估了我们关于日益复杂和日益复杂的一系列机器人控制问题的配方的性表现。我们通过对传统方法进行比较,以数据和克隆行为方式来突出其效益。