The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.
翻译:在自主车辆控制中,使用神经网络和强化学习越来越受欢迎。然而,由此产生的控制政策的不透明性为在自主车辆中部署神经网络控制提供了重大障碍。在本文件中,我们提出了一种基于强化学习的自主车辆纵向控制方法,即基于规则的安全笼为车辆提供了更大的安全性,而且对强化学习代理人的监督不力。通过指导代理人采取有意义的状态和行动,这种薄弱的监督提高了培训过程中的趋同性,并加强了最后培训政策的安全性。这一基于规则的监督控制者具有进一步的好处,即完全可以解释,从而使传统的验证和核查方法能够确保车辆安全。我们将模型与安全笼以及具有最佳和受限制模型参数的模型进行比较,并表明薄弱的监督始终能提高车辆的探索安全性、趋同速度和模型性能。此外,我们表明,当模型参数受到制约或不够优化时,安全笼子可以使模型学习安全驾驶政策,即使模型无法通过单靠强化学习来进行驾驶培训。