PTR:基于部分概念、关系和物理理由的基准 (PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning)

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

翻译：人类视觉感知的一个重要方面是能够将视觉场景分析成单个物体,然后进一步分析成物体部分,形成完整的等级结构。这种复合结构可以产生一套丰富的语义概念和关系,从而在视觉信号的解释和组织以及视觉感知和推理的概括化中发挥重要作用。然而,现有的视觉推理基准主要侧重于物体而不是部分。基于整个整体层次的视觉推理比以物体为中心的推理更具有挑战性。这些图像与700公里的机器推理相匹配,涉及各种类型的推理,因此,为了更好地为基于部分的概念、关系和物理推理服务,我们引入了一个新的大规模诊断性视觉推理数据集PTR。PTR包含约70k RGBD合成图像,其中含有地面真实对象和部分说明,涉及语义分解、颜色属性、空间和几何关系,以及某些物理特性,如稳定性。这些图像与700公里的机器推理问题相配对,涉及各种类型的推理,因此它们可以成为视觉推理模型的好测试台。我们研究了一些州际的视觉推理学模型,在其中可以轻易地观察到许多数据推理学选择。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【CVPR2021】自监督几何感知

专知会员服务

46+阅读 · 2021年3月6日

【WSDM2021】拓扑去噪的鲁棒图神经网络

专知会员服务

27+阅读 · 2020年11月14日