Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.
翻译:决策Transformer(DT)在离线强化学习设置中表现出了强大的性能,但是快速适应未见过的新任务仍然具有挑战性。为了解决这一挑战,我们提出了一个新的框架,称为超级决策Transformer(HDT),它可以以数据和参数高效的方式从少量演示中推广到新任务。为了实现这样的目标,我们提出了在基本DT上增加一个适应模块的方法,其参数由超网络初始化。在遇到未见过的任务时,超网络将少量演示作为输入并相应地初始化适应模块。此初始化使得HDT能够通过仅微调适应模块来高效地适应新任务。我们在物体操作任务中验证了HDT的泛化能力。我们发现,仅使用一次专家演示并微调0.5%的DT参数,HDT比微调整个DT模型更快地适应未见过的任务。最后,我们探索了更具挑战性的情况,其中专家行动不可用,并且我们展示了HDT在任务成功率方面的表现优于最先进基线模型。