Learning policies that effectively utilize language instructions in complex, multi-task environments is an important problem in sequential decision-making. While it is possible to condition on the entire language instruction directly, such an approach could suffer from generalization issues. In our work, we propose \emph{Learning Interpretable Skill Abstractions (LISA)}, a hierarchical imitation learning framework that can learn diverse, interpretable primitive behaviors or skills from language-conditioned demonstrations to better generalize to unseen instructions. LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy. In navigation and robotic manipulation environments, LISA outperforms a strong non-hierarchical Decision Transformer baseline in the low data regime and is able to compose learned skills to solve tasks containing unseen long-range instructions. Our method demonstrates a more natural way to condition on language in sequential decision-making problems and achieve interpretable and controllable behavior with the learned skills.
翻译:在复杂、多任务环境中有效利用语言指导的学习政策是连续决策中的一个重要问题。虽然有可能直接以整个语言教学为条件,但这种做法可能会受到一般化问题的影响。我们在工作中提出\ emph{ 学习解释性技能抽象(LISA)},这是一个等级化的模仿学习框架,能够从有语言条件的演示中学习多样化的、可解释的原始行为或技能,以更好地向看不见的指示推广。LISA使用矢量量化来学习与语言指令和所学政策的行为高度相关的离散技能代码。在导航和机器人操作环境中,LISA超越了低数据体系中强大的非等级决策变异基线,能够将所学技能用于解决包含看不见远程指示的任务。我们的方法更自然地展示了一种条件,在顺序决策问题上满足语言条件,并用所学技能实现可解释和可控制的行为。