In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
翻译:在物体检测中,通常会使用Non-maximum Suppression (NMS)等后处理方法。NMS可以大大减少假阳性检测,但仍可能保留一些具有低目标置信度的检测。为了找到图像中精确的物体数量和它们的标签,我们提出了一个名为检测选择算法(DSA)的后处理方法,该算法在NMS或相关方法之后使用。DSA贪心地选择一组检测的边界框,以及完整的物体重建,以基于最高似然度的解释整个图像,考虑到物体的遮挡。该算法由四个组件组成。首先,我们将遮挡分支添加到更快的R-CNN中,以获得对象之间的遮挡关系。其次,我们开发了一种单一的重建算法,它可以在给定可见部分的情况下重建对象的整个外观,基于训练了的生成网络的潜在变量的优化。我们称之为解码器。第三,我们提出了一种全重建算法,它生成了所有假设解释中所有对象的联合重建,考虑到遮挡排序。最后,我们提出了一种贪心算法,它根据相应解释的似然度逐步添加或删除列表中的检测。DSA与NMS或Soft-NMS相结合可以比NMS或Soft-NMS本身获得更好的结果,如我们在多个3D对象的合成图像实验中所示。