It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations. Most existing methods tend to solve this problem by using a two-phase learning procedure, i.e., multiple instance learning detector followed by a fully supervised learning detector with bounding-box regression. Based on our observation, this procedure may lead to local minima for some object categories. In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem. Specifically, we design a single network with both multiple instance learning and bounding-box regression branches that share the same backbone. Meanwhile, a guided attention module using classification loss is added to the backbone for effectively extracting the implicit location information in the features. Experimental results on public datasets show that our method achieves state-of-the-art performance.
翻译:由于没有实例级分类说明,因此对受监管薄弱的物体探测网络准确预测物体位置具有挑战性,因为没有实例级分类说明。大多数现有方法倾向于通过两阶段学习程序来解决这一问题,即多实例学习探测器,然后是完全监管的学习探测器,然后是带捆绑盒回归的学习探测器。根据我们的观察,这一程序可能导致某些物体类别的本地迷你。在本文件中,我们提议以端到端的方式联合培训这两个阶段来解决这一问题。具体地说,我们设计了一个具有多个实例学习和捆绑式回归分支的单一网络,这些分支都具有相同的骨干。与此同时,一个使用分类损失的引导关注模块被添加到骨干中,以有效提取这些特征中的隐含位置信息。公共数据集的实验结果显示,我们的方法达到了最先进的性能。