Annotating data for supervised learning is expensive and tedious, and we want to do as little of it as possible. To make the most of a given "annotation budget" we can turn to active learning (AL) which aims to identify the most informative samples in a dataset for annotation. Active learning algorithms are typically uncertainty-based or diversity-based. Both have seen success in image classification, but fall short when it comes to object detection. We hypothesise that this is because: (1) it is difficult to quantify uncertainty for object detection as it consists of both localisation and classification, where some classes are harder to localise, and others are harder to classify; (2) it is difficult to measure similarities for diversity-based AL when images contain different numbers of objects. We propose a two-stage active learning algorithm Plug and Play Active Learning (PPAL) that overcomes these difficulties. It consists of (1) Difficulty Calibrated Uncertainty Sampling, in which we used a category-wise difficulty coefficient that takes both classification and localisation into account to re-weight object uncertainties for uncertainty-based sampling; (2) Category Conditioned Matching Similarity to compute the similarities of multi-instance images as ensembles of their instance similarities. PPAL is highly generalisable because it makes no change to model architectures or detector training pipelines. We benchmark PPAL on the MS-COCO and Pascal VOC datasets using different detector architectures and show that our method outperforms the prior state-of-the-art. Code is available at https://github.com/ChenhongyiYang/PPAL
翻译:用于监管学习的注释性数据成本昂贵且乏味, 我们想尽可能少地做。 为了尽量充分利用给定的“ 批注预算 ”, 我们可以转向积极学习( AL), 目的是识别数据集中用于批注的最为丰富的样本。 积极的学习算法通常是基于不确定性的或基于多样性的。 这两种算法在图像分类方面都取得了成功, 但是在发现目标时却落后于此。 我们假设这是因为:(1) 很难量化对象检测的不确定性, 因为它既包括本地化和分类, 某些类更难本地化, 另一些类则更难分类; (2) 当图像包含不同的对象数量时, 很难测量基于多样性的 AL ; 我们提议一个两阶段性积极的学习算法 Plug 和 Plap 学习( PPAL ), 克服了这些困难。 包括 (1) 难以校准的不确定性。 我们使用一种分类和本地化的难度系数, 将基于基于基于不确定性的模型的量性标值对象的不确定性计算; (2) 分类性OC 和多级校正的校准性, 使其在普通的校正性结构上显示其相似性, 。