使用自操作的预文本任务进行主动学习 (Using Self-Supervised Pretext Tasks for Active Learning)

Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.

翻译：标签大量数据是昂贵的。积极学习的目的是要解决这个问题, 要求只批注来自未贴标签的数据集中信息最丰富的数据。我们提出一种新的主动学习方法, 使用自我监督的托辞任务和一个独特的数据取样员来选择困难和有代表性的数据。我们发现, 丢失简单的自我监督的托辞任务, 如轮调预测, 与下游任务损失密切相关。在积极学习的迭代之前, 借口任务学习者在未贴标签的数据集上接受培训, 未贴标签的数据通过他们借口的任务损失进行分类并分成成批。在每次主动学习的迭代中, 主要任务模型用来抽样抽样最不确定的数据, 以批次附加注释。我们评估我们关于各种图像分类和分解基准的方法, 并在 CIFAR10、 Caltech- 101、图像网和 Cityscovers 上取得令人信服的性能。我们进一步显示, 我们的方法在失衡的数据集上表现良好, 并且可以有效地解决由于随机抽样初始标签而影响积极学习性能的冷开始的问题。

相关内容

主动学习

关注 240

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日