Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria. This is typically coupled with the costly assumption that every image selected for labelling must be exhaustively annotated. This yields incremental improvements on well-curated vision datasets and struggles in the presence of data imbalance and visual clutter that occurs in real-world imagery. Alternatives to the image-level approach are surprisingly under-explored in the literature. In this work, we introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach that promotes spatial-diversity by avoiding nearby redundant queries from the same image and minimizes context-switching for the labeler. We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
翻译:通过应用为分类而开发的技术,将个人探测结果汇总到图像级选择标准中,从而实现物体探测的积极学习,这是传统的方式。这通常与成本高昂的假设相结合,即每个选定标签的图像都必须作详尽的注释。这在真实世界图像中出现数据不平衡和视觉模糊的情况下,对精准的视觉数据集和挣扎产生渐进的改进。在文献中,图像级方法的替代方法令人惊讶地探索不足。在这项工作中,我们引入了一种新的战略,将以前的图像级和目标级方法纳入一个普遍的、区域级方法,通过避免附近对同一图像的重复查询和尽量减少标签机的上下文切换,促进空间的多样性。我们表明,这种方法大大减少了标签工作,并改进了以固有的阶级平衡和封闭场景对现实数据的稀有的物体搜索。