Recent work has shown that Large Language Models (LLMs) can be incredibly effective for offline reinforcement learning (RL) by representing the traditional RL problem as a sequence modelling problem (Chen et al., 2021; Janner et al., 2021). However many of these methods only optimize for high returns, and may not extract much information from a diverse dataset of trajectories. Generalized Decision Transformers (GDTs) (Furuta et al., 2021) have shown that utilizing future trajectory information, in the form of information statistics, can help extract more information from offline trajectory data. Building upon this, we propose Skill Decision Transformer (Skill DT). Skill DT draws inspiration from hindsight relabelling (Andrychowicz et al., 2017) and skill discovery methods to discover a diverse set of primitive behaviors, or skills. We show that Skill DT can not only perform offline state-marginal matching (SMM), but can discovery descriptive behaviors that can be easily sampled. Furthermore, we show that through purely reward-free optimization, Skill DT is still competitive with supervised offline RL approaches on the D4RL benchmark. The code and videos can be found on our project page: https://github.com/shyamsn97/skill-dt
翻译:最近的工作表明,大型语言模型(LLMS)通过将传统的RL问题作为序列建模问题(Chen等人,2021年;Janner等人,2021年)来代表传统的RL问题,对于离线强化学习(RL)可能非常有效(RL),因为传统RL问题是一个序列建模问题(Chen等人,2021年;Janner等人,2021年)。然而,许多这些方法只能优化高回报,可能无法从多种轨迹数据集中提取大量信息。一般化的决定变异器(GDTs)(Furuta等人,2021年)已经表明,利用信息统计形式的未来轨迹信息,可以帮助从离线轨轨数据中提取更多的信息。在此基础上,我们提议Skill决定变换器(Skill DT)(Skill Developy Translationeralning relaveloptional(Andalchalchis) 和MLDDDDF) 项目在R-97/SkyLDDBS上仍然具有竞争力。我们在DDDDB的DBS的DGSurviews上发现。