Reinforcement learning (RL) has been widely adopted to make intelligent driving policy in autonomous driving due to the self-evolution ability and humanoid learning paradigm. Despite many elegant demonstrations of RL-enabled decision-making, current research mainly focuses on the pure vehicle driving environment while ignoring other traffic participants like bicycles and pedestrians. For urban roads, the interaction of mixed traffic flows leads to a quite dynamic and complex relationship, which poses great difficulty to learn a safe and intelligent policy. This paper proposes the encoding integrated decision and control (E-IDC) to handle complicated driving tasks with mixed traffic flows, which composes of an encoding function to construct driving states, a value function to choose the optimal path as well as a policy function to output the control command of ego vehicle. Specially, the encoding function is capable of dealing with different types and variant number of traffic participants and extracting features from original driving observation. Next, we design the training principle for the functions of E-IDC with RL algorithms by adding the gradient-based update rules and refine the safety constraints concerning the otherness of different participants. The verification is conducted on the intersection scenario with mixed traffic flows and result shows that E-IDC can enhance the driving performance, including the tracking performance and safety constraint requirements with a large margin. The online application indicates that E-IDC can realize efficient and smooth driving in the complex intersection, guaranteeing the intelligence and safety simultaneously.
翻译:由于自我革命能力和人文学习模式,现已广泛采用强化学习(RL),以在自主驾驶中制定智能驾驶政策,这是自我进化能力和人文学习模式造成的。尽管以RL为主的决策有许多优雅的示范,但当前研究主要侧重于纯车辆驾驶环境,而忽视自行车和行人等其他交通参与者。对于城市道路而言,混合交通流量的相互作用导致一种非常动态和复杂的关系,这给学习安全和智能政策带来极大的困难。本文件提议采用编码综合决定和控制(E-IDC),处理复杂的驾驶任务,包括混合交通流动的编码功能、选择最佳路线的价值观以及输出自用车辆控制指令的政策功能。特别是,编码功能能够处理不同类型和不同数量的交通参与者,并从最初的驾驶观察中提取特征。我们同时设计电子-ID算法与RL算法的功能的培训原则,增加基于梯度的复杂更新规则,并改进与不同参与者的安全限制。核查是在交汇中进行的,与混合交通流动跟踪的交叉情况以及政策功能显示EID的顺利性能保证安全性,电子-C能够提高电子-IC的进度,并显示电子-C的进度上的流动安全性能保证实现。