Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition---understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance.
翻译:物体由部件组成, 每个物体都有不同的几何、 物理、 功能和承载。 开发这种分布式、 物理、 可解释的物体表示方式将有利于智能剂更好地探索和与世界互动。 在本文中, 我们研究物理原始分解, 通过物体各组成部分, 每个组成部分都有物理和几何属性。 由于物体部分和物理学的附加说明数据很少, 我们建议一种新颖的表述方式, 通过解释物体的外观及其在物理事件中的行为来学习物理原始。 我们的模型在块状塔和工具的合成和真实场景中运行良好; 我们还表明视觉和物理观察往往提供互补信号。 我们还提出动和行为研究, 以便更好地了解我们的模型, 并与人类的性能形成对比。