* 使用语言指导世界模拟模式作出有文字指导的决策。 (Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling)

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world, which makes learning complex tasks with sparse rewards difficult. If initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that is tested and verified during exploration, to improve sample efficiency in embodied RL agents. Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft in two phases: (1) the Dream phase where the agent uses an LLM to decompose a task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase where the agent learns a modular policy for each subgoal and verifies or corrects the hypothesized AWM on the basis of its experiences. Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude but is also robust to and corrects errors in the LLM, successfully blending noisy internet-scale information from LLMs with knowledge grounded in environment dynamics.

翻译：强化学习(RL)代理机构通常在不事先了解世界的情况下学习 tabula rasa,这会使学习工作变得复杂而报酬少。如果在初始阶段了解高层次次级目标和分目标之间的过渡,RL代理机构可以使用这个抽象世界模型(AWM)进行规划和探索。我们建议使用微小的大型语言模型(LLMS)来模拟在勘探期间测试和核实的AWM(AWM),以提高包含的RLA剂的样本效率。我们的DEKARD代理机构将LLM引导的探索应用到在Minecraft上的项目制作分为两个阶段:(1) 梦阶段,该代理机构利用LLM(LM)将任务分解成一个子目标序列,即假设的AWM(AWM);以及(2) 觉阶段,该代理机构根据经验学习每个子目标的模块政策,校验或校正一个虚度的AWM(AWM)。我们用LM(LM(LM)来对AWM(LM)作假,然后根据代理机构的经验核查AWM(AWM)不仅不仅提高了现代方法,不仅提高了现代方法的样本效率,而且由高压级的磁质、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、低环境。