Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models. Unfortunately, applying such models to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of rewards or safety constraints that robots may require. On the other hand, language-conditioned robotic policies that learn from interaction data can provide the necessary grounding that allows the agent to be correctly situated in the real world, but such policies are limited by the lack of high-level semantic understanding due to the limited breadth of the interaction data available for training them. Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate this guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models. The project's website can be found at grounded-decoding.github.io.
翻译:大型语言模型(LLMS)最近的进展表明,通过自动递减模型的预培训,有能力学习和利用因特网规模的知识。 不幸的是,将这种模型应用于机器人等内装剂的环境下,由于在物理世界中缺乏经验,无法分析非语言观测,不了解机器人可能需要的奖励或安全限制,因此具有挑战性。另一方面,从互动数据学习的有语言限制的机器人政策可以提供必要的基础,使代理商能够正确处于真实世界中,但这种政策因缺乏高层次的语义理解而受到限制,因为用于培训的交互数据范围有限。因此,如果我们想在语言模型中使用语义知识,同时将它置于一个内装饰环境环境中,我们就必须构建一个既可能根据语言模型进行,又能根据基于环境模型进行现实的行动序列。我们将此设定为类似于概率过滤的问题:在语言模型下解码一个高度概率的序列,而在一套基础性模型目标下也具有很高的概率。我们可以在一套深层次的模型下,在一套深层次的模型下,用这个模型来演示这个模型的解算方法。我们可以在复杂的模型上找到一个解算法。</s>