Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training. However, major challenges remain: (1) differentiation of object instances can be ambiguous; (2) detectors tend to focus on discriminative parts rather than entire objects; (3) without ground truth, object proposals have to be redundant for high recalls, causing significant memory consumption. Addressing these challenges is difficult, as it often requires to eliminate uncertainties and trivial solutions. To target these issues we develop an instance-aware and context-focused unified framework. It employs an instance-aware self-training algorithm and a learnable Concrete DropBlock while devising a memory-efficient sequential batch back-propagation. Our proposed method achieves state-of-the-art results on COCO ($12.1\% ~AP$, $24.8\% ~AP_{50}$), VOC 2007 ($54.9\% ~AP$), and VOC 2012 ($52.1\% ~AP$), improving baselines by great margins. In addition, the proposed method is the first to benchmark ResNet based models and weakly supervised video object detection. Code, models, and more details will be made available at: https://github.com/NVlabs/wetectron.
翻译:缺乏监督的学习通过减少培训期间对严格监督的需求,已成为一个令人信服的物体探测工具,但仍然存在重大挑战:(1) 区分物体情况可能含糊不清;(2) 探测器往往侧重于有区别的部件,而不是整个物体;(3) 没有地面真相,物体建议必须多余,才能引起大量的记忆消耗; 应对这些挑战十分困难,因为往往需要消除不确定性和微不足道的解决办法; 要解决这些挑战,我们往往需要消除不确定性和以实例和背景为重点的统一框架; 以这些问题为目标,我们开发一个有实例意识的自我培训算法和可学习的混凝土投管,同时设计一个记忆高效的连续分批反演; 我们提议的方法在COCO(12.1 ⁇ ~AP$,24.8 ⁇ ~AP ⁇ 50}美元)、VOC 2007 (54.9 ⁇ ~AP$)和VOC 2012 (52.1 ⁇ ~AP$)上取得最先进的结果; 以大幅度改进基线。此外,拟议的方法是首先为ResNet模型和监管薄弱的视频物体探测基准。代码、模型和更多细节将在以下网站提供: https://giusubs/trobla.