Language model (LM) pre-training has proven useful for a wide variety of language processing tasks, but can such pre-training be leveraged for more general machine learning problems? We investigate the effectiveness of language modeling to scaffold learning and generalization in autonomous decision-making. We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings, and translated into actions using a policy network initialized with a pre-trained transformer LM. We demonstrate that this framework enables effective combinatorial generalization across different environments, such as VirtualHome and BabyAI. In particular, for test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6% in VirtualHome. We hypothesize and investigate three possible factors underlying the effectiveness of LM-based policy initialization. We find that sequential representations (vs. fixed-dimensional feature vectors) and the LM objective (not just the transformer architecture) are both important for generalization. Surprisingly, however, the format of the policy inputs encoding (e.g. as a natural language string vs. an arbitrary sequential encoding) has little influence. Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
翻译:语言培训前模式(LM)已证明对多种语言处理任务有用,但这种培训前模式能够用于更一般性的机器学习问题?我们调查语言建模的实效,在自主决策中进行脚架学习和概括化;我们描述一个模拟学习框架,其目标和观察作为嵌入序列,并转化为行动,使用先期培训的变压器启动的政策网络。我们证明,这一框架能够在不同环境中,例如虚拟Home和BABAAI进行有效的组合概括化。特别是,对于涉及新目标或新场景的测试任务,以语言模型为首的政策将任务完成率提高43.6%的虚拟Home。我们假设和调查了基于LM的政策初始化有效性的三个可能因素。我们发现,顺序表达(v.固定维维维特质矢量)和LM目标(而不仅仅是变压器结构)对于概括化都很重要。但令人惊讶的是,政策输入的格式(例如,甚至作为外语的外语系的初始化模式,这些顺序排列结果也表明,这些顺序排列结果的模型对共同来说是有用的。