Imagine an autonomous robot vehicle driving in dense, possibly unregulated urban traffic. To contend with an uncertain, interactive environment with heterogeneous traffic of cars, motorcycles, buses, ..., the robot vehicle has to plan in both short and long terms in order to drive effectively and approach human-level performance. Planning explicitly over a long time horizon, however, incurs prohibitive computational cost and is impractical under real-time constraints. To achieve real-time performance for large-scale planning, this work introduces Learning from Tree Search for Driving (LeTS-Drive), which integrates planning and learning in a closed loop. LeTS-Drive learns a driving policy from a planner, which is based on sparsely sampled tree search. The learned policy in turn guides online planning for real-time vehicle control. These two steps are repeated to form a closed loop so that the planner and the learner inform each other and improve in synchrony. The entire system can learn on its own in a self-supervised manner, without human effort on explicit data labeling. We applied LeTSDrive to autonomous driving in crowded urban environments in simulation. Experimental results show clearly that LeTS-Drive outperforms either planning or learning alone, as well as open-loop integration of planning and learning.
翻译:想象一个自主的机器人车,驾驶的汽车密度大,可能不受管制的城市交通。为了与汽车、摩托车、公共汽车、...等各种交通的不确定的互动环境作斗争,机器人车必须进行短期和长期规划,以便有效推动和处理人的性能。但是,在很长的时间跨度上进行规划显然会产生令人望而却步的计算成本,在实时限制下是不切实际的。为了实现大规模规划的实时性能,这项工作引入了从树搜索中学习驾驶(LeTS-Drive),它将规划和学习纳入一个封闭循环。 LeTS-Drive从一个规划者那里学到了驾驶政策,该政策以稀有采样的树木搜索为基础。学习的政策反过来指导了实时车辆控制的在线规划。这两个步骤会反复形成一个封闭的循环,以便规划者和学习者相互沟通,并改进同步性。整个系统可以自己学习,而不用人的努力进行明确的数据标签。我们应用LTS-Drive在模拟的拥挤的城市环境中自行驾驶。实验结果清楚地显示LTS-D的学习模式是学习。