This paper proposes an inverse optimal control method which enables a robot to incrementally learn a control objective function from a collection of trajectory segments. By saying incrementally, it means that the collection of trajectory segments is enlarged because additional segments are provided as time evolves. The unknown objective function is parameterized as a weighted sum of features with unknown weights. Each trajectory segment is a small snippet of optimal trajectory. The proposed method shows that each trajectory segment, if informative, can pose a linear constraint to the unknown weights, thus, the objective function can be learned by incrementally incorporating all informative segments. Effectiveness of the method is shown on a simulated 2-link robot arm and a 6-DoF maneuvering quadrotor system, in each of which only small demonstration segments are available.
翻译:本文件提出了一种反向最佳控制方法,使机器人能够从轨迹段的集合中逐步学习控制目标函数。 递增说, 这意味着轨迹段的收集会随着时间的演变而扩大, 因为随时间的演变而增加部分。 未知目标函数的参数化是具有未知重量的特征的加权总和。 每个轨迹段都是最佳轨迹的小片块。 拟议方法显示,每个轨迹段,如果信息丰富,都可以对未知的重量造成线性限制, 因此, 可以通过渐进地吸收所有信息部分来学习客观功能。 方法的效力在模拟的2连环机器人臂和6多氟操纵的二次钻探系统中显示, 每个系统只有小块的演示区。