以目标指导的变革型变革型辅助强化学习促进高效自主导航 (Goal-guided Transformer-enabled Reinforcement Learning for Efficient Autonomous Navigation)

Despite some successful applications of goal-driven navigation, existing deep reinforcement learning-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-art baselines. Demonstration videos are available at \colorb{https://youtu.be/93LGlGvaN0c.

翻译：尽管成功地应用了目标驱动的导航,但现有的深层强化学习方法臭名昭著地受到数据效率问题的影响。原因之一是,目标信息与感知模块脱钩,直接作为决策条件引入,导致现场代表与目标无关的特点在学习过程中扮演对手角色。据此,我们提出了一个新的、以目标指导的基于变异器的强化学习方法(GTRL),将实际目标状态视为场景编码器的一种输入器,用以指导现场代表与目标信息相结合,实现高效自主导航。更具体地说,我们提出将视野变异器作为感知系统的支柱,即目标指导变异器(GOT),并事先将它与目标相关特点结合起来,在提高数据效率之前,先将目标指导的场面显示器作为目标定位编码,然后将GOT作为现场代表作为输入和生成决策指令的一种输入器。结果是,我们的做法促使场面代表主要侧重于目标相关特性,即:GO型变换变换器(GGT),在提高数据效率前,将数据升级到SLDR(L)的高级性数据模拟,将数据定位比高性,将数据升级为我们的实际业绩的升级为SLDR(ODR)的升级。