Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses. In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and CityScapes.
翻译:标签大量数据是昂贵的。 积极学习的目的是要解决这个问题, 要求只批注来自未贴标签的数据集中信息最丰富的数据。 我们提出一种新的主动学习方法, 利用自我监督的托辞任务和一个独特的数据取样员来选择困难和具有代表性的数据。 我们发现, 丢失简单的自我监督的托辞任务, 如轮调预测, 与下游任务损失密切相关。 托辞任务学习员在未贴标签的数据集上接受培训, 未贴标签的数据则按其托辞任务损失分类和分组。 在每次循环中, 主要任务模型用来抽样最不确定的数据, 批次进行附加说明。 我们评估各种图像分类和分割基准的方法, 并在 CIFAR10、 Caltech- 101、图像网络和 CityScapes 上取得令人信服的业绩 。