Understanding the 3D world from 2D images involves more than detection and segmentation of the objects within the scene. It also includes the interpretation of the structure and arrangement of the scene elements. Such understanding is often rooted in recognizing the physical world and its limitations, and in prior knowledge as to how similar typical scenes are arranged. In this research we pose a new challenge for neural network (or other) scene understanding algorithms - can they distinguish between plausible and implausible scenes? Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements. Hence, we define plausibility as the probability of encountering a given scene in the real physical world. We build a dataset of synthetic images containing both plausible and implausible scenes, and test the success of various vision models in the task of recognizing and understanding plausibility.
翻译:从 2D 图像中了解 3D 世界, 从 2D 图像中了解 3D, 不仅涉及对现场内物体的探测和分割, 还包括对现场要素的结构和安排的解释 。 这种理解往往植根于对物理世界及其局限性的认识, 以及事先对相似的典型场景安排的了解 。 在这项研究中, 我们对神经网络( 或其他) 的场景理解算法提出了新的挑战 - 它们能区分合理和不可信的场景吗? 可见性可以从物理特性以及功能和典型安排的角度来界定。 因此, 我们把可视性定义为在真实物理世界中遇到某一场景的可能性。 我们建立一个合成图像的数据集, 包含可信和不可信的场景, 测试各种视觉模型在认识和理解可视性的任务中的成功。