Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.
翻译:标签大量数据是昂贵的。 积极学习的目的是要解决这个问题, 要求只批注来自未贴标签的数据集中信息最丰富的数据。 我们提出一种新的主动学习方法, 使用自我监督的托辞任务和一个独特的数据取样员来选择困难和有代表性的数据。 我们发现, 丢失简单的自我监督的托辞任务, 如轮调预测, 与下游任务损失密切相关。 在积极学习的迭代之前, 借口任务学习者在未贴标签的数据集上接受培训, 未贴标签的数据通过他们借口的任务损失进行分类并分成成批。 在每次主动学习的迭代中, 主要任务模型用来抽样抽样最不确定的数据, 以批次附加注释 。 我们评估我们关于各种图像分类和分解基准的方法, 并在 CIFAR10、 Caltech- 101、 图像网 和 Cityscovers 上取得令人信服的性能。 我们进一步显示, 我们的方法在失衡的数据集上表现良好, 并且可以有效地解决由于随机抽样初始标签而影响积极学习性能的冷开始的问题。