Decision and control are core functionalities of high-level automated vehicles. Current mainstream methods, such as functionality decomposition and end-to-end reinforcement learning (RL), either suffer high time complexity or poor interpretability and adaptability on real-world autonomous driving tasks. In this paper, we present an interpretable and computationally efficient framework called integrated decision and control (IDC) for automated vehicles, which decomposes the driving task into static path planning and dynamic optimal tracking that are structured hierarchically. First, the static path planning generates several candidate paths only considering static traffic elements. Then, the dynamic optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately and follow the one with the best tracking performance. To unload the heavy online computation, we propose a model-based reinforcement learning (RL) algorithm that can be served as an approximate constrained OCP solver. Specifically, the OCPs for all paths are considered together to construct a single complete RL problem and then solved offline in the form of value and policy networks, for real-time online path selecting and tracking respectively. We verify our framework in both simulations and the real world. Results show that compared with baseline methods IDC has an order of magnitude higher online computing efficiency, as well as better driving performance including traffic efficiency and safety. In addition, it yields great interpretability and adaptability among different driving tasks. The effectiveness of the proposed method is also demonstrated in real road tests with complicated traffic conditions.
翻译:决策和控制是高层自动化车辆的核心功能。 当前的主流方法,例如功能分解和端到端强化学习(RL),要么在现实世界自主驾驶任务上遭遇高度复杂的时间复杂性,要么解释性和适应性差,或者在现实世界自主驾驶任务中遭遇高度复杂的时间复杂性,或者在现实世界自主驾驶任务中遇到不易理解和适应性差。 在本文件中,我们提出了一个解释和计算效率高的框架,称为自动车辆的综合决定和控制(IDC),将驾驶任务分解成静态路径规划和动态最佳跟踪结构分级的固定路径。 首先,静态路径规划产生若干候选路径,但只考虑静态交通要素。 然后,动态的最佳跟踪旨在跟踪最佳路径,在考虑动态障碍的同时跟踪最佳路径。 为此,我们为每个候选人路径制定了一个有限的最佳控制问题, 分别优化它们, 并遵循最佳跟踪业绩的优化框架。 为了卸载重的在线计算,我们提出了基于模型的强化学习(RL)算算法, 具体地,所有路径的OCP只被考虑建立一个单一的完整RL问题,然后在价值和政策网络交通网络中解决,然后以脱线,,,同时比较在线运输效率, 并显示我们真实的升级的升级的进度, 的进度, 的进度, 的进度和升级的进度是分别显示。 我们的进度的进度的模拟, 的进度的进度的模拟,以显示的进度的进度的进度的精确的进度, 。