Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.
翻译:了解自主机器人运行的场景是其胜任功能的关键所在。 这种场景理解需要承认交通参与者的情况以及一般场景语义,可以通过全光截面任务有效解决。 在本文中,我们引入了高效全光截面结构(EffificentPS)结构,由共同的骨干组成,该骨干能够有效地编码并结合出精致丰富的多尺度功能。我们加入了一个新的语义头,以一致的方式将精细和背景特征汇总在一起,并有一个新的变体,将Make R-CNN作为实例头。我们还提出了一个新的全景拼凑模块,将我们高效的PS结构负责人的输出日志相融合在一起,以产生最后的全光截面截面输出输出。此外,我们引入了KITTI全光截面截面数据集,其中包含对广受质疑的KITTI基准的全光性说明。我们对市景、KITTI、Maply Vistas和印度驱动数据集进行的广泛评估表明,我们提议的架构始终在所有四个基准上设置新的状态,同时成为最高效和快速的全光截面建筑日期。