Unified panoptic segmentation methods are achieving state-of-the-art results on several datasets. To achieve these results on high-resolution datasets, these methods apply crop-based training. In this work, we find that, although crop-based training is advantageous in general, it also has a harmful side-effect. Specifically, it limits the ability of unified networks to discriminate between large object instances, causing them to make predictions that are confused between multiple instances. To solve this, we propose Intra-Batch Supervision (IBS), which improves a network's ability to discriminate between instances by introducing additional supervision using multiple images from the same batch. We show that, with our IBS, we successfully address the confusion problem and consistently improve the performance of unified networks. For the high-resolution Cityscapes and Mapillary Vistas datasets, we achieve improvements of up to +2.5 on the Panoptic Quality for thing classes, and even more considerable gains of up to +5.8 on both the pixel accuracy and pixel precision, which we identify as better metrics to capture the confusion problem.
翻译:综合全景分割方法在多个数据集上已经实现了最先进的结果。为了在高分辨率数据集上实现这些结果,这些方法采用了基于裁剪的训练。然而,在这项工作中,我们发现,虽然基于裁剪的训练通常是有利的,但也有害的一面。具体来说,它限制了统一网络区分大目标实例的能力,导致它们做出的预测在多个实例之间混淆。为了解决这个问题,我们提出了批内监督(IBS),通过在同一批次使用多个图像引入额外的监督,从而提高网络区分实例的能力。我们证明,通过我们的IBS,我们成功地解决了混淆问题,并持续提高了统一网络的性能。在高分辨率的Cityscapes和Mapillary Vistas数据集上,我们取得了长达+2.5的Panoptic Quality得分提升,以及更高的像素精度和像素精确度,达到了+5.8的显著提高。我们认为这些指标更好地捕捉了混淆问题。