This work evaluates and analyzes the combination of imitation learning (IL) and differentiable model predictive control (MPC) for the application of human-like autonomous driving. We combine MPC with a hierarchical learning-based policy, and measure its performance in open-loop and closed-loop with metrics related to safety, comfort and similarity to human driving characteristics. We also demonstrate the value of augmenting open-loop behavioral cloning with closed-loop training for a more robust learning, approximating the policy gradient through time with the state space model used by the MPC. We perform experimental evaluations on a lane keeping control system, learned from demonstrations collected on a fixed-base driving simulator, and show that our imitative policies approach the human driving style preferences.
翻译:这项工作评估并分析模拟学习(IL)和不同模型预测控制(MPC)结合应用人型自主驾驶的模拟学习(MPC)的组合。我们把MPC与基于学习的等级政策相结合,并用与安全、舒适和与人驾驶特征相似的度量测量其在开放环和闭环中的性能。我们还展示了通过闭环培训增加开放环行为性克隆的价值,以进行更强有力的学习,与MPC使用的国家空间模型相近的政策梯度。 我们从固定基驱动模拟器收集的演示中学习,对车道控制系统进行了实验性评估,并展示了我们的模仿政策接近人类驾驶风格的偏好。