Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning strategy designed specifically to address the well-documented failure modes of weakly-supervised detectors. Experiments on the VOC07 and COCO benchmarks show that BiB outperforms other active learning techniques and significantly improves the base weakly-supervised detector's performance with only a few fully-annotated images per class. BiB reaches 97% of the performance of fully-supervised Fast RCNN with only 10% of fully-annotated images on VOC07. On COCO, using on average 10 fully-annotated images per class, or equivalently 1% of the training set, BiB also reduces the performance gap (in AP) between the weakly-supervised detector and the fully-supervised Fast RCNN by over 70%, showing a good trade-off between performance and data efficiency. Our code is publicly available at https://github.com/huyvvo/BiB.
翻译:使用微弱的注释性说明器(BiB)从培训组中自动挑选出一些带有充分附加说明的样本,通过“box-in-box'”(BiB),这是一个新的积极学习战略,专门用来处理记录完善的低监管探测器的失败模式。VOC07和COCO基准实验显示,BiB优于其他主动学习技术,大大改进基础监测器的性能(AP),每类只有几张完全附加说明的图像。BiB达到完全超过快RCNNN的性能97%,VOC07只有10%的全附加说明性图像。在COCOCO上,平均使用10张全附加说明性图像,或相当于培训组的1%,BiB还显示,BiBB优超强的检测器和完全超强监督性能探测器的性能差距(AP)小于其他积极学习技术,并显著改善基础的性能,每类中只有几部完全附加说明性能的图像。