代表多模式学习-观察机器人家庭行动所需常识的语义限制 (Semantic constraints to represent common sense required in household actions for multi-modal Learning-from-observation robot)

The paradigm of learning-from-observation (LfO) enables a robot to learn how to perform actions by observing human-demonstrated actions. Previous research in LfO have mainly focused on the industrial domain which only consist of the observable physical constraints between a manipulating tool and the robot's working environment. In order to extend this paradigm to the household domain which consists non-observable constraints derived from a human's common sense; we introduce the idea of semantic constraints. The semantic constraints are represented similar to the physical constraints by defining a contact with an imaginary semantic environment. We thoroughly investigate the necessary and sufficient set of contact state and state transitions to understand the different types of physical and semantic constraints. We then apply our constraint representation to analyze various actions in top hit household YouTube videos and real home cooking recordings. We further categorize the frequently appearing constraint patterns into physical, semantic, and multistage task groups and verify that these groups are not only necessary but a sufficient set for covering standard household actions. Finally, we conduct a preliminary experiment using textual input to explore the possibilities of combining verbal and visual input for recognizing the task groups. Our results provide promising directions for incorporating common sense in the literature of robot teaching.

翻译：从观察中学习的范式(LfO)使机器人能够通过观察人类示范的行动来学习如何采取行动。LfO的以往研究主要集中于工业领域,工业领域仅包括操纵工具与机器人工作环境之间的可见物理限制。为了将这一范式扩大到家庭领域,包括来自人类常识的非可观察限制;我们引入语义限制的概念。语义限制与物理限制相似,方法是确定与想象中的语义环境的接触。我们彻底调查一套必要和充分的接触状态和状态,以了解不同类型的物理和语义限制。然后我们运用我们的限制性代表来分析顶端的YouTube视频和真正的家庭烹饪录音中的各种行动。我们进一步将经常出现的制约模式分为物理、语义和多阶段任务组,并核实这些组不仅有必要,而且足以涵盖标准的家庭行动。最后,我们利用文字投入进行初步实验,探索将口头和视觉投入结合起来的可能性,以认识机器人任务组。我们提出了有希望的结果。