Offline reinforcement learning (RL) is a challenging task, whose objective is to learn policies from static trajectory data without interacting with the environment. Recently, offline RL has been viewed as a sequence modeling problem, where an agent generates a sequence of subsequent actions based on a set of static transition experiences. However, existing approaches that use transformers to attend to all tokens naively can overlook the dependencies between different tokens and limit long-term dependency learning. In this paper, we propose the Graph Decision Transformer (GDT), a novel offline RL approach that models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts and facilitate temporal and causal relationship learning. GDT uses a graph transformer to process the graph inputs with relation-enhanced mechanisms, and an optional sequence transformer to handle fine-grained spatial information in visual tasks. Our experiments show that GDT matches or surpasses the performance of state-of-the-art offline RL methods on image-based Atari and OpenAI Gym.
翻译:离线强化学习(RL)是一项具有挑战性的任务,目标是从静态轨道数据中学习政策,而不与环境互动。最近,离线RL被视为一个序列模型问题,一个代理根据一系列静态过渡经验产生一系列后续行动。然而,目前使用变压器处理所有象征物的方法天真地忽视不同象征物之间的依赖性,限制长期依赖性学习。在本文件中,我们提议了图解决定变压器(GDT),这是一种新型离线RL方法,将输入序列建在因果图表中,以捕捉基本不同概念之间的潜在依赖性,促进时间和因果关系学习。 GDT使用图形变压器处理图表输入与增强关系机制的关系,并使用任择序列变压器处理视觉任务中精细的空间信息。我们的实验显示,GDT在基于图像的Atari和OpenAI Gym上,GDT匹配或超过状态的离线RL方法的性能。</s>