Recent advances in robotic mobile manipulation have spurred the expansion of the operating environment for robots from constrained workspaces to large-scale, human environments. In order to effectively complete tasks in these spaces, robots must be able to perceive, reason, and execute over a diversity of affordances, well beyond simple pick-and-place. We posit the notion of semantic frames provides a compelling representation for robot actions that is amenable to action-focused perception, task-level reasoning, action-level execution, and integration with language. Semantic frames, a product of the linguistics community, define the necessary elements, pre- and post- conditions, and a set of sequential robot actions necessary to successfully execute an action evoked by a verb phrase. In this work, we extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot. We show that language models such as GPT-3 are insufficient to address generalized task execution covered by the SEAL formulation and SeFM provides robots with efficient search strategies and long term memory needed when operating in building-scale environments.
翻译:近来移动机器人操作的先进发展推动了机器人操作环境从受限工作区到大规模人类环境的扩展。为了有效地完成这些空间中的任务,机器人必须能够感知、推理和执行丰富的可行性,而不仅限于简单的拾取和放置。我们提出了语义框架的概念,提供了机器人行动的一种合适的表示方式,适用于面向行动的感知、任务级别的推理、行动级别的执行以及与语言的整合。由语言学界提出的语义框架定义了执行由动词短语引发的行动所必需的元素、前后条件和一组连续机器人行动。在这项工作中,我们扩展了机器人操作行动的语义框架表示,并引入了面向感知到机器人行动所提供的位置的行动的“SEAL”问题作为图形模型。对于“SEAL”问题,我们描述了我们的非参数语义框架映射算法(SeFM),用于维护一组有限的语义框架作为机器人所感知到的行动位置的信念。我们发现,像GPT-3这样的语言模型无法解决SEAL制定的广义任务执行,而SeFM为机器人提供了在大型建筑物环境中操作时所需的高效搜索策略和长期记忆。