The imitation learning of self-driving vehicle policies through behavioral cloning is often carried out in an open-loop fashion, ignoring the effect of actions to future states. Training such policies purely with Empirical Risk Minimization (ERM) can be detrimental to real-world performance, as it biases policy networks towards matching only open-loop behavior, showing poor results when evaluated in closed-loop. In this work, we develop an efficient and simple-to-implement principle called Closed-loop Weighted Empirical Risk Minimization (CW-ERM), in which a closed-loop evaluation procedure is first used to identify training data samples that are important for practical driving performance and then we these samples to help debias the policy network. We evaluate CW-ERM in a challenging urban driving dataset and show that this procedure yields a significant reduction in collisions as well as other non-differentiable closed-loop metrics.
翻译:通过行为性克隆模拟自我驾驶车辆政策学习往往以开放方式进行,忽略了对未来国家行动的影响。 仅仅用实证风险最小化(ERM)来培训这类政策可能会损害现实世界的绩效,因为它会偏向于匹配仅开放性流动行为的政策网络,在对封闭性循环进行评估时显示的结果不佳。 在这项工作中,我们制定了一个高效和简单到执行的原则,称为“封闭性循环弱视性风险最小化(CW-ERM ) ”, 根据该原则,首先使用封闭性流动评估程序来确定对实际驾驶性能至关重要的培训数据样本,然后我们这些样本来帮助削弱政策网络。 我们在挑战性城市驾驶数据集中评估CW-ERM,并表明这一程序可以大幅降低碰撞和其他无差别的封闭性流动指标。