Humans have the remarkable ability to perceive objects as a whole, even when parts of them are occluded. This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. The goal of this task is to simultaneously predict the pixel-wise semantic segmentation labels of the visible regions of stuff classes and the instance segmentation labels of both the visible and occluded regions of thing classes. To facilitate research on this new task, we extend two established benchmark datasets with pixel-level amodal panoptic segmentation labels that we make publicly available as KITTI-360-APS and BDD100K-APS. We present several strong baselines, along with the amodal panoptic quality (APQ) and amodal parsing coverage (APC) metrics to quantify the performance in an interpretable manner. Furthermore, we propose the novel amodal panoptic segmentation network (APSNet), as a first step towards addressing this task by explicitly modeling the complex relationships between the occluders and occludes. Extensive experimental evaluations demonstrate that APSNet achieves state-of-the-art performance on both benchmarks and more importantly exemplifies the utility of amodal recognition. The benchmarks are available at http://amodal-panoptic.cs.uni-freiburg.de.
翻译:人类具有将物体作为一个整体看待的非凡能力, 即使部分物体被隐蔽。 这种现代感知能力构成了我们对世界的感知和认知理解的基础。 为了使机器人能够以这种能力来理解我们的世界。 为了让机器人能够以这种能力来理解我们,我们制定并提议了一项新颖的任务,我们命名了一种现代全光截面分割。我们的目标是同时预测可视物质类区域以及可见的可视和隐蔽的物品类区域实例分割标签的像素-智慧语分解标签。为了便利对这一新任务的研究,我们扩展了两套已经建立的基准数据集,这些数据集带有像素-水平的网络对世界的感知和认知。我们公开提供KITTI-360-APS和BDD100K-APS。 我们提出若干强有力的基线,同时使用可解释的表面素质量(APQ)和模型覆盖(CPC)衡量标准,以量化可见和隐蔽的物品类的性能。此外,我们提议在新型的光谱分解网络(APSNet)上扩展两个基准,作为我们公开的通用性通用性通用数据交换标准之间的一个步骤。 明确展示了这项任务的实验性前实验性评估。 和实验性关系,通过一个复杂的实验性评估,从而明确地展示了这一任务实现。