Self-evolution is indispensable to realize full autonomous driving. This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC), an advanced framework built on reinforcement learning (RL). First, an RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC. It adapts the MPG under the penalty method so that it can solve constrained optimization problems using both the data and model. Second, an attention-based encoding (ABE) method is designed to tackle the state representation issue. It introduces an embedding network for feature extraction and a weighting network for feature fusion, fulfilling order-insensitive encoding and importance distinguishing of road users. Finally, by fusing CMPG and ABE, we develop the first data-driven decision and control system under the IDC architecture, and deploy the system on a fully-functional self-driving vehicle running in daily operation. Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods. It also demonstrates safe, efficient and smart driving behavior in various complex scenes at a signalized intersection with real mixed traffic flow.
翻译:自我革命是实现完全自主驾驶所必不可少的。 本文展示了基于集成决定和控制(IDC)的自我演化决策系统,这是一个基于强化学习(RL)的先进框架。 首先,一个称为受限混合政策梯度(CMPG)的RL算法(CMPG)建议持续提升IDC的驱动政策。 它根据惩罚方法对MPG进行调整,以便它能够用数据和模型解决限制优化问题。 其次,基于关注的编码(ABE)方法旨在解决州代表制问题。 它引入了地物提取嵌入网络和地物聚合、满足对秩序不敏感的编码和区分道路使用者的加权网络。 最后,我们利用CMPG和ABE,在IDC架构下开发第一个数据驱动的决定和控制系统,将系统安装在日常运行的功能齐全的自行驱动车辆上。 实验结果显示,通过数据推进,该系统能够实现更好的驱动能力,超越基于模型的方法。 它还展示了在与实际混合交通流量交错交交的信号交叉点上的各种复杂场的安全、高效和智能驾驶行为。