Amodal panoptic segmentation aims to connect the perception of the world to its cognitive understanding. It entails simultaneously predicting the semantic labels of visible scene regions and the entire shape of traffic participant instances, including regions that may be occluded. In this work, we formulate a proposal-free framework that tackles this task as a multi-label and multi-class problem by first assigning the amodal masks to different layers according to their relative occlusion order and then employing amodal instance regression on each layer independently while learning background semantics. We propose the \net architecture that incorporates a shared backbone and an asymmetrical dual-decoder consisting of several modules to facilitate within-scale and cross-scale feature aggregations, bilateral feature propagation between decoders, and integration of global instance-level and local pixel-level occlusion reasoning. Further, we propose the amodal mask refiner that resolves the ambiguity in complex occlusion scenarios by explicitly leveraging the embedding of unoccluded instance masks. Extensive evaluation on the BDD100K-APS and KITTI-360-APS datasets demonstrate that our approach set the new state-of-the-art on both benchmarks.
翻译:在这项工作中,我们制定了一个无提案的框架,将这项任务作为一个多标签和多级问题处理,首先根据不同层次的相对封闭顺序将现代面罩划归不同的层次,然后在学习背景语义学的同时独立地对每个层次使用模式实例回归。我们提议了包含一个共同骨干和一个对称双解码器的网络结构,其中包括几个模块,以促进规模内和跨规模地貌群集、脱钩器之间的双边地貌传播以及全球例级和地方像素级闭塞理论的整合。此外,我们提议了一种现代面罩改进器,通过明确利用隐含无隐隐隐性掩码的掩体,解决复杂的封闭情景中的模糊性。我们提议对BDD100K-APS和KITTI-360-APS数据设置基准进行广泛的评价,以显示我们设定的BDD100K-APS和KIT-IT-AT-APS两个基准的状态。