When in a new situation or geographical location, human drivers have an extraordinary ability to watch others and learn maneuvers that they themselves may have never performed. In contrast, existing techniques for learning to drive preclude such a possibility as they assume direct access to an instrumented ego-vehicle with fully known observations and expert driver actions. However, such measurements cannot be directly accessed for the non-ego vehicles when learning by watching others. Therefore, in an application where data is regarded as a highly valuable asset, current approaches completely discard the vast portion of the training data that can be potentially obtained through indirect observation of surrounding vehicles. Motivated by this key insight, we propose the Learning by Watching (LbW) framework which enables learning a driving policy without requiring full knowledge of neither the state nor expert actions. To increase its data, i.e., with new perspectives and maneuvers, LbW makes use of the demonstrations of other vehicles in a given scene by (1) transforming the ego-vehicle's observations to their points of view, and (2) inferring their expert actions. Our LbW agent learns more robust driving policies while enabling data-efficient learning, including quick adaptation of the policy to rare and novel scenarios. In particular, LbW drives robustly even with a fraction of available driving data required by existing methods, achieving an average success rate of 92% on the original CARLA benchmark with only 30 minutes of total driving data and 82% with only 10 minutes.
翻译:在新的形势或地理位置中,驾驶员有非凡的能力来观察他人,学习他们自己从未做过的操作。相比之下,现有的学习技术来推动驱动力,从而排除了这种可能性,因为他们可以直接进入具有完全已知的观察和专家驾驶器的仪表自驾驶器。然而,当非驾驶器通过观察他人学习时,不能直接进入非驾驶器。因此,在将数据视为极有价值的资产的应用中,目前的方法完全抛弃了通过间接观察周围车辆而可能获得的大量培训数据。受这一关键洞察的启发,我们建议了通过观察学习学习(LbW)框架,这样可以学习驾驶政策而无需完全了解国家或专家行动。要增加数据,即用新的视角和动作,LbW利用其他车辆在特定场景中的演示,即(1) 将自驾驶器的观察意见转变为自己的观点,以及(2) 仅推断其专家行动。我们的LbW代理商学习了更稳健的驾驶政策,同时能够进行数据节能学习,包括快速地学习驾驶政策,而无需完全了解州或专家行动,而无需完全了解国家驾驶率为20分钟的原始数据率,根据现有10 %的原始数据率,具体数据率调整。