In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
翻译:在目标检测中,广泛使用“最高抑制”(NMS)等后处理方法。NMS可以大量减少假阳性检测的数量,但仍可以保留某些检测,但目标分数较低。为了在图像中找到物体及其标签的确切数量,我们提议了一个名为“检测选择算法(DSA)”的后处理方法,该方法在NMS或相关方法之后使用。DSA贪婪地选择了一组已检测到的捆绑框,同时进行完全的天体重建,以最有可能解释整个图像,同时考虑到对象分界。算法由四个组成部分组成。首先,我们为更快的 R-CNN增加一个隐蔽处,以获得对象之间的隐蔽关系。第二,我们开发了一种单一的重组算法,以优化我们称之为解码器的经过训练的基因化网络的潜在变量。第三,我们建议一种完整的重建算法,通过虚构的解析法来联合重建所有物体,同时考虑到对对象的分级解释,同时考虑对对象的分立。首先,我们为更快的 R-CNNMS,最后,我们提议一种从S 最大程度的解算算法,或更可能从SAVAR 变为S 的递增到更精确的解算法。