Decision and control are two of the core functionalities of high-level automated vehicles. Current mainstream methods, such as functionality decomposition or end-to-end reinforcement learning (RL), either suffer high time complexity or poor interpretability and limited safety performance in real-world complex autonomous driving tasks. In this paper, we present an interpretable and efficient decision and control framework for automated vehicles, which decomposes the driving task into multi-path planning and optimal tracking that are structured hierarchically. First, the multi-path planning is to generate several paths only considering static constraints. Then, the optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, in theory, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately and choose the one with the best tracking performance to follow. More importantly, we propose a model-based reinforcement learning (RL) algorithm, which is served as an approximate constrained OCP solver, to unload the heavy computation by the paradigm of offline training and online application. Specifically, the OCPs for all paths are considered together to construct a multi-task RL problem and then solved offline by our algorithm into value and policy networks, for real-time online path selecting and tracking respectively. We verify our framework in both simulation and the real world. Results show that our method has better online computing efficiency and driving performance including traffic efficiency and safety compared with baseline methods. In addition, it yields great interpretability and adaptability among different driving tasks. The real road test also suggests that it is applicable in complicated traffic scenarios without even tuning.
翻译:决策和控制是高层自动化车辆的两个核心功能。 当前的主流方法,例如功能分解或端到端强化学习(RL),在现实世界复杂的自主驾驶任务中,要么时间复杂,要么解释性差,安全性差,或者在现实世界复杂的自主驾驶任务中,安全性能有限。 在本文件中,我们为自动车辆提出了一个可解释的高效决定和控制框架,将驱动任务分解成多路规划和最佳跟踪结构层次分级。 首先,多路路规划只产生几种路径,只考虑静态限制。 然后,最佳跟踪的目的是跟踪最佳路径,同时考虑动态障碍。 理论上,我们为此,我们为每个候选人的路径设计了一个有限的最佳控制问题(OCP),单独优化它们,并选择一个最佳跟踪性框架。 更重要的是,我们提议一个基于模型的强化学习(RL)算法,该算法大约是一种制约 OCP解算法,用离线培训和在线应用的范式来卸载重的计算。 具体地,所有路径的 OCP 被共同考虑,在考虑同时构建一个多塔- 实际交通流量可操作性(OCP) 。 从理论上说, 和在线逻辑上, 我们的轨测算法的路径上, 都显示我们的运行效率,, 选择了我们的世界的路径, 的路径, 选择了一个比真实性测算法, 双轨的路径, 的路径,,,,,,,,,, 双向方向,,我们的计算方法, 选择了我们的计算方法,, 选择了我们的计算方法,,,,, 选择了比真实的计算方法, 选择了我们的计算方法, 选择了比真实的路径,,,,,,,,, 选择了我们的 的 的计算方法,,,,,, 选择了我们的计算方法,,, 选择了我们的 选择了我们的 的 的 的 的 的 的,, 选择了我们的计算方法,,,,, 选择了我们的计算方法,,,,,, 选择了我们的 的, 的