Driving in a dynamic, multi-agent, and complex urban environment is a difficult task requiring a complex decision-making policy. The learning of such a policy requires a state representation that can encode the entire environment. Mid-level representations that encode a vehicle's environment as images have become a popular choice. Still, they are quite high-dimensional, limiting their use in data-hungry approaches such as reinforcement learning. In this article, we propose to learn a low-dimensional and rich latent representation of the environment by leveraging the knowledge of relevant semantic factors. To do this, we train an encoder-decoder deep neural network to predict multiple application-relevant factors such as the trajectories of other agents and the ego car. Furthermore, we propose a hazard signal based on other vehicles' future trajectories and the planned route which is used in conjunction with the learned latent representation as input to a down-stream policy. We demonstrate that using the multi-head encoder-decoder neural network results in a more informative representation than a standard single-head model. In particular, the proposed representation learning and the hazard signal help reinforcement learning to learn faster, with increased performance and less data than baseline methods.
翻译:在一个动态的、多试剂的和复杂的城市环境中驾驶是一项艰巨的任务,需要复杂的决策政策。学习这种政策需要能够将整个环境编码的国家代表制。将车辆环境编码为图像的中级代表制已成为一种受欢迎的选择。不过,它们是相当高的层面,限制了其在数据饥饿方法中的使用,如强化学习等。在本篇文章中,我们提议通过利用相关语义因素的知识来学习低维和丰富的潜在环境代表制。为此,我们培训了一个能将整个环境编码化的深神经网络,以预测多种应用相关因素,如其他物剂的轨迹和自驾驶车。此外,我们提出一个基于其他车辆的未来轨迹和计划路径的危险信号,这些信号与学习的潜在代表制相结合,作为下流政策的投入。我们证明,使用多头编码脱线网络比标准的单头模型更具有信息化的代表性。特别是,拟议的表述制式学习和危险信号有助于更快地学习业绩,而不是提高基线。