Analyzing complex scenes with Deep Neural Networks is a challenging task, particularly when images contain multiple objects that partially occlude each other. Existing approaches to image analysis mostly process objects independently and do not take into account the relative occlusion of nearby objects. In this paper, we propose a deep network for multi-object instance segmentation that is robust to occlusion and can be trained from bounding box supervision only. Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders and to classify objects based on their non-occluded parts. We extend their generative model to include multiple objects and introduce a framework for efficient inference in challenging occlusion scenarios. In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations. We introduce an Occlusion Reasoning Module (ORM) that locates erroneous segmentations and estimates the occlusion order to correct them. The improved segmentation masks are, in turn, integrated into the network in a top-down manner to improve the image classification. Our experiments on the KITTI INStance dataset (KINS) and a synthetic occlusion dataset demonstrate the effectiveness and robustness of our model at multi-object instance segmentation under occlusion. Code is publically available at https://github.com/XD7479/Multi-Object-Occlusion.
翻译:对深神经网络的复杂场景进行分析是一项具有挑战性的任务,特别是当图像含有部分相互覆盖的多个对象时。现有的图像分析方法大多是独立处理对象,而没有考虑到附近对象的相对封闭性。在本文中,我们提议建立一个对封闭性强且只能通过捆绑框监督培训的多对象实例分解深度网络。我们的工作以构件网络为基础,这些网络学习了神经特征激活的基因模型,以定位渗漏器,并根据非隐蔽部分对对象进行分类。我们扩展了这些图像分析方法,主要是独立处理对象,没有考虑到附近物体的相对封闭性。特别是,我们获得了一个对对象类别及其实例和分解分解部分的反馈前向预测。我们引入了一种分解性模块(ORM),用以定位错误的分解和估计隐蔽性顺序以纠正它们。经改进的分解面遮蔽面罩以上下的方式融入网络,我们将其组合模型化模型/分解方式纳入了多个对象的分解度模型/分解度模型/分解系统。我们在模型下对图像的精确度进行我们的数据测试,这是在模型/内部的解/内部的解析系统内,我们的数据。我们在模型/内部的解析化的解析中的数据。我们的数据的解的解和合成的解的解。