Real-time planning under uncertainty is critical for robots operating in complex dynamic environments. Consider, for example, an autonomous robot vehicle driving in dense, unregulated urban traffic of cars, motorcycles, buses, etc. The robot vehicle has to plan in both short and long terms, in order to interact with many traffic participants with uncertain intentions and drive effectively. Planning explicitly over a long time horizon, however, incurs prohibitive computational costs and is impractical under real-time constraints. To achieve real-time performance for large-scale planning, this work introduces a new algorithm Learning from Tree Search for Driving (LeTS-Drive), which integrates planning and learning in a closed loop, and applies it to autonomous driving in crowded urban traffic in simulation. Specifically, LeTS-Drive learns a policy and its value function from data provided by an online planner, which searches a sparsely-sampled belief tree; the online planner in turn uses the learned policy and value functions as heuristics to scale up its run-time performance for real-time robot control. These two steps are repeated to form a closed loop so that the planner and the learner inform each other and improve in synchrony. The algorithm learns on its own in a self-supervised manner, without human effort on explicit data labeling. Experimental results demonstrate that LeTS-Drive outperforms either planning or learning alone, as well as open-loop integration of planning and learning.
翻译:不确定情况下的实时规划对于在复杂动态环境中运行的机器人至关重要。 比如, 考虑一个自主的机器人车在密集、 不受管制的城市交通中驾驶汽车、 摩托车、 公共汽车等。 机器人车必须在短期和长期内进行规划, 以便与许多意图不确定的交通参与者进行互动并有效驱动。 但是, 在一个漫长的时期内进行规划, 会产生令人望而却步的计算成本, 在实时限制下不切实际。 为了实现大规模规划的实时性能, 这项工作引入了一种新的算法“ 从树搜索中学习实时机器人( LeTS- Drive) ”, 将规划和学习整合到封闭循环中, 并将其应用到拥挤的城市交通中进行模拟的自动驱动。 具体地说, LeTS- Drive 从在线规划者提供的数据中学习政策及其价值功能, 以搜索鲜少的光斑的信仰树; 在线规划者反过来会利用学习的政策和价值功能来扩大实时机器人控制的运行周期性绩效。 这两个步骤反复重复形成一个封闭的循环, 以便规划者和学习者自己单独地在模拟规划中学习自己的同步, 。