To ensure user acceptance of autonomous vehicles (AVs), control systems are being developed to mimic human drivers from demonstrations of desired driving behaviors. Imitation learning (IL) algorithms serve this purpose, but struggle to provide safety guarantees on the resulting closed-loop system trajectories. On the other hand, Model Predictive Control (MPC) can handle nonlinear systems with safety constraints, but realizing human-like driving with it requires extensive domain knowledge. This work suggests the use of a seamless combination of the two techniques to learn safe AV controllers from demonstrations of desired driving behaviors, by using MPC as a differentiable control layer within a hierarchical IL policy. With this strategy, IL is performed in closed-loop and end-to-end, through parameters in the MPC cost, model or constraints. Experimental results of this methodology are analyzed for the design of a lane keeping control system, learned via behavioral cloning from observations (BCO), given human demonstrations on a fixed-base driving simulator.
翻译:为确保用户接受自控车辆,正在开发控制系统,以模仿从需要的驾驶行为演示中产生的人驾驶员。模拟学习(IL)算法为这一目的服务,但努力为由此产生的闭路系统轨迹提供安全保障。另一方面,模型预测控制(MPC)可以处理安全受限的非线性系统,但实现人性化驾驶需要广泛的域知识。这项工作表明,使用两种技术的无缝结合,通过使用MPC作为等级的IL政策中可区分的控制层来学习安全AV控制器,采用这一战略,通过MPC成本、模型或限制参数,在闭路和端端进行IL。对这种方法的实验结果进行了分析,以便设计车道控制系统,从观察(BCO)中从行为性克隆中学习。由于人类在固定驾驶模拟器上进行演示,因此从观察(BCO)中学会了行为性克隆方法。