We study the problem of unsupervised physical object discovery. While existing frameworks aim to decompose scenes into 2D segments based off each object's appearance, we explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.
翻译:我们研究未经监督的物理物体发现问题。虽然现有框架旨在将场景分解成基于每个物体外观的2D区段,但我们探索物理学,特别是物体相互作用如何以不受监督的方式,将3D几何和物体位置与视频脱钩。从发育心理学、我们的物理物体发现网络(POD-Net)中汲取灵感,利用多尺度像素提示和物理运动提示来准确观测可观测和部分隐蔽的大小不同的物体,并推断这些物体的特性。我们的模型在合成和真实场景上的可靠区段物体。所发现的物体特性也可以用来解释物理事件。