半监督积极学习与时间输出差异 (Semi-Supervised Active Learning with Temporal Output Discrepancy)

While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion that enhances model performance by incorporating the unlabeled data. Due to the simplicity of TOD, our active learning approach is efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.

翻译：虽然深层学习在一系列广泛的任务中取得成功,但高度依赖大量收集昂贵和耗时的附加说明的数据。为了降低数据注释的成本,建议积极学习以互动的方式询问一个孔孔,在未贴标签的数据集中说明一小部分信息样本。受损失较多的样本通常比损失较少的样本对模型更具有信息意义这一事实的启发,在本文件中,我们提出了一个全新的深层积极学习方法,在未贴标签的样本被认为包含高损失时,询问数据注释的奥秘。我们的方法的核心是测量时间输出不一致性(TOD),通过评估不同优化步骤模型提供的产出差异来估计抽样损失。我们的理论调查显示,由于TOD对累积的样本损失进行了较低的限制,因此可以用来选择信息性、无标签的样本样本。根据TOD,我们进一步制定了有效的未贴标签的数据取样战略,以及一个未经校准的学习标准,通过纳入未贴标签的数据来提高模型的性能。由于对模型进行简单化的测试,我们积极学习的方法比实验性强,因此,我们积极学习的学习方式是高端的。

相关内容

主动学习

关注 241

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日