积极学习双级规划国家关系摘要</s> (Embodied Active Learning of Relational State Abstractions for Bilevel Planning)

State abstraction is an effective technique for planning in robotics environments with continuous states and actions, long task horizons, and sparse feedback. In object-oriented environments, predicates are a particularly useful form of state abstraction because of their compatibility with symbolic planners and their capacity for relational generalization. However, to plan with predicates, the agent must be able to interpret them in continuous environment states (i.e., ground the symbols). Manually programming predicate interpretations can be difficult, so we would instead like to learn them from data. We propose an embodied active learning paradigm where the agent learns predicate interpretations through online interaction with an expert. For example, after taking actions in a block stacking environment, the agent may ask the expert: "Is On(block1, block2) true?" From this experience, the agent learns to plan: it learns neural predicate interpretations, symbolic planning operators, and neural samplers that can be used for bilevel planning. During exploration, the agent plans to learn: it uses its current models to select actions towards generating informative expert queries. We learn predicate interpretations as ensembles of neural networks and use their entropy to measure the informativeness of potential queries. We evaluate this approach in three robotic environments and find that it consistently outperforms six baselines while exhibiting sample efficiency in two key metrics: number of environment interactions, and number of queries to the expert. Code: https://tinyurl.com/active-predicates

翻译：国家抽象是机器人环境中规划的有效技术,机器人环境具有连续的状态和行动、长期任务前景和稀少的反馈。在目标导向环境中,上游是一种特别有用的状态抽象形式,因为它们与象征性规划者的兼容性及其关联性一般化的能力。然而,要与上游计划,代理人必须能够在连续的环境状态下解释它们(即,标记符号)。人工编程的上游解释可能很困难,因此我们更想从数据中了解这些解释。我们提出了一个包含的积极学习模式,在这种模式中,代理人通过与专家的在线互动学习上游解释。例如,在一个块堆叠环境中采取行动后,代理人可能会问专家:“在(第1块1项,第2项)中,上游是特别有用的形式。根据这种经验,代理人学会计划:它学习神经的上游解释、象征性规划操作者以及可用于双层规划的神经采样器。在探索过程中,我们用其当前模型来选择产生信息化专家查询的行动。我们从这种神经网络的组合中学习了上游解释,并使用其树本样本样样样样样样样样样样样式的切方法来测量可能显示数字。我们用机器人样样样样样样样样样样样样样的模型来测量环境。</s>