Imagine an autonomous robot vehicle driving in dense, possibly unregulated urban traffic. To contend with an uncertain, interactive environment with many traffic participants, the robot vehicle has to perform long-term planning in order to drive effectively and approach human-level performance. Planning explicitly over a long time horizon, however, incurs prohibitive computational cost and is impractical under real-time constraints. To achieve real-time performance for large-scale planning, this paper introduces Learning from Tree Search for Driving (LeTS-Drive), which integrates planning and learning in a close loop. LeTS-Drive learns a driving policy from a planner based on sparsely-sampled tree search. It then guides online planning using this learned policy for real-time vehicle control. These two steps are repeated to form a close loop so that the planner and the learner inform each other and both improve in synchrony. The entire algorithm evolves on its own in a self-supervised manner, without explicit human efforts on data labeling. We applied LeTS-Drive to autonomous driving in crowded urban environments in simulation. Experimental results clearly show that LeTS-Drive outperforms either planning or learning alone, as well as open-loop integration of planning and learning.
翻译:想象一下在密集、可能不受管制的城市交通中驾驶的自主机器人汽车。为了与许多交通参与者的不确定、互动环境作斗争,机器人汽车必须进行长期规划,以便有效驱动和处理人性化表现。但是,在长期范围内进行规划显然会产生令人望而却步的计算成本,在实时限制下是不切实际的。为了实现大规模规划的实时性能,本文件介绍了“从树搜索中学习驾驶”(LeTS-Drive),它将规划和学习纳入一个紧密循环。 LeTS-Drive从一个规划者那里学习了一种驾驶政策,该规划者基于鲜有采样的树搜索。然后它用这一学习的政策指导在线规划,以便实时控制车辆。这两个步骤反复重复了形成一个密切的循环,以便规划者和学习者相互交流,同时改进同步。整个算法以自己超超常的方式演变,在数据标签上没有明确的人类努力。我们应用LTS-Drive在模拟中从拥挤的城市环境中自主驾驶。实验结果清楚地表明,LTS-Drive的实验结果是单式的学习,或者单项的学习,学习,学习。