Decision Transformer (DT) is a recently proposed architecture for Reinforcement Learning that frames the decision-making process as an auto-regressive sequence modeling problem and uses a Transformer model to predict the next action in a sequence of states, actions, and rewards. In this paper, we analyze how crucial the Transformer model is in the complete DT architecture on continuous control tasks. Namely, we replace the Transformer by an LSTM model while keeping the other parts unchanged to obtain what we call a Decision LSTM model. We compare it to DT on continuous control tasks, including pendulum swing-up and stabilization, in simulation and on physical hardware. Our experiments show that DT struggles with continuous control problems, such as inverted pendulum and Furuta pendulum stabilization. On the other hand, the proposed Decision LSTM is able to achieve expert-level performance on these tasks, in addition to learning a swing-up controller on the real system. These results suggest that the strength of the Decision Transformer for continuous control tasks may lie in the overall sequential modeling architecture and not in the Transformer per se.
翻译:决策变换器(DT)是最近提出的强化学习架构,它将决策过程设定为自动递减序列模型问题,并使用变换模型来预测一系列状态、行动和奖励的下一步行动。在本文中,我们分析了变换器模型在完整的DT结构中对于连续控制任务的重要性。也就是说,我们用LSTM模型取代变换器,同时保持其他部分不变,以获得我们称之为LSTM模型的决定。我们在模拟和物理硬件方面将它与连续控制任务,包括平时回旋和稳定。我们的实验显示,变换器与连续控制问题,如倒转的平板和Furuta平板稳定,相挣扎。另一方面,拟议的LSTM决定能够在这些任务上取得专家级业绩,此外还在实际系统上学习回旋控制器。这些结果表明,持续控制任务的决定变换器的强度可能存在于总体顺序建模结构中,而不是在变换器本身。