加强强化学习,以基于变换器的场面代表学习促进自主驾驶决策 (Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving)

Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.

翻译：城市自主驾驶的决策具有挑战性,因为互动式交通参与者的随机性以及道路结构的复杂性。虽然强化学习(RL)决策机制对于处理城市驾驶方案很有希望,但它的抽样效率低,适应性差。在本文件中,我们提议Sceen-Rep变异器来提高城市自主驾驶决策能力,同时采用更好的现场代表编码和连续预测潜伏蒸馏法。具体地说,一个多阶段变异器(MST)编码器的构建不仅模拟了自我汽车与其邻居之间的互动意识,而且模拟了代理人及其候选路线之间的意图意识。一个具有自我监督学习目标的连续潜伏变异器(SLT)被用来将未来的预测信息注入潜在的场面代表中,以便减少探索空间,加快培训。基于软的演员-动力(SAC)的最后决策模块将Seneen-Rep变异变器和产出驱动行动的精细的潜影化场景代表作为投入。框架在五种具有挑战性的模拟的城市情景中被验证,以密集交通为主,其自我监督的学习目标的连续潜质变换变换器(SLT),其性性性表现表现在质量上表现良好,从质量上显示我们的安全性框架,从质量表现到更稳定性反应能力,从质量上表现到更良好。