In this work, we introduce panoramic panoptic segmentation, as the most holistic scene understanding, both in terms of Field of View (FoV) and image-level understanding for standard camera-based input. A complete surrounding understanding provides a maximum of information to a mobile agent. This is essential information for any intelligent vehicle to make informed decisions in a safety-critical dynamic environment such as real-world traffic. In order to overcome the lack of annotated panoramic images, we propose a framework which allows model training on standard pinhole images and transfers the learned features to the panoramic domain in a cost-minimizing way. The domain shift from pinhole to panoramic images is non-trivial as large objects and surfaces are heavily distorted close to the image border regions and look different across the two domains. Using our proposed method with dense contrastive learning, we manage to achieve significant improvements over a non-adapted approach. Depending on the efficient panoptic segmentation architecture, we can improve 3.5-6.5% measured in Panoptic Quality (PQ) over non-adapted models on our established Wild Panoramic Panoptic Segmentation (WildPPS) dataset. Furthermore, our efficient framework does not need access to the images of the target domain, making it a feasible domain generalization approach suitable for a limited hardware setting. As additional contributions, we publish WildPPS: The first panoramic panoptic image dataset to foster progress in surrounding perception and explore a novel training procedure combining supervised and contrastive training.
翻译:在这项工作中,我们引入了全景全景部分,作为最全面的场景理解,既包括视野场(FoV),也包括对标准相机输入的图像级理解。完整的周围理解为移动代理提供了最大程度的信息。这是任何智能载体在像现实世界交通这样的安全关键动态环境中做出知情决定的基本信息。为了克服缺乏附加说明的全景图像的问题,我们提议了一个框架,允许对标准针孔图像进行示范培训,并以成本最小化的方式向全景域传输所学的特征。从针孔到全景图像的全景级理解是非三边化的,因为大型天体和表面在接近图像边框区域时被严重扭曲,而且看起来横跨两个领域不同。我们用拟议的方法,通过密集的对比性学习,我们设法在非适应性的方法上取得了重大改进。根据高效的全景部分结构,我们可以改进在全景质量(PQQ)中测量的3.5-6.5%测量到非适应性模型,在我们建立的全景色图像中进行非全景部一级合并的图像分析, 使普通域域域域内的数据进入更需要。