Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests that large language models (LLMs) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying LLMs during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered LLM whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages LLM-generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill LLM knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated LLM queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that LLM-generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the LLM-based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.
翻译:在复杂、基于文本的环境中进行长期规划,由于开放的动作空间、模糊的观察和稀疏的反馈,带来了重大挑战。最近的研究表明,大型语言模型(LLMs)编码了丰富的世界语义知识,这对于在具身和纯文本环境中指导智能体进行高层推理和规划具有重要价值。然而,现有方法通常在训练和推理过程中严重依赖查询LLMs,导致计算成本高昂且难以高效部署。此外,这些方法通常使用未经调整的预训练LLM,其参数在整个训练过程中保持不变,无法适应目标任务。为解决这些局限性,我们提出了SCOPE(面向高效规划的子目标条件预训练),这是一种一次性分层规划器,仅在初始化时利用LLM生成的子目标来预训练一个轻量级学生模型。与先前通过反复提示模型在训练中自适应生成子目标来蒸馏LLM知识的方法不同,我们的方法直接从示例轨迹中推导子目标。这一设计消除了重复查询LLM的需求,显著提高了效率,但代价是降低了可解释性并可能产生次优子目标。尽管存在次优性,我们在TextCraft环境中的结果表明,LLM生成的子目标仍可作为基于文本的规划任务中分层目标分解的强有力起点。与基于LLM的分层智能体ADaPT(Prasad等人,2024年)相比,其成功率为0.52,我们的方法达到了0.56,并将推理时间从164.4秒大幅减少至仅3.0秒。