While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion for active learning. Due to the simplicity of TOD, our methods are efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks. In addition, we show that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.
翻译:深层次的学习在一系列广泛的任务中取得成功,但高度依赖大量收集昂贵和耗时的附加说明的数据。为了降低数据注释的成本,建议积极学习以互动的方式询问一个孔孔,在未贴标签的数据集中说明一小部分信息样本。受损失较多的样本通常比损失较少的样本更能为模型提供更多信息的启发,本文提出一种新的深层次的积极学习方法,在未贴标签的样本被认为包含高损失时,询问数据批注的奥秘。为了降低数据批注的成本,我们的方法的核心是测量时间输出不一致性(TOD),通过评估不同优化步骤模型提供的产出差异来估计抽样损失。我们的理论调查表明,由于TOD对累积的样本损失进行了较低的限制,因此可以用来选择信息丰富的、无标签的样本。根据TOD,我们进一步开发一种有效的未贴标签的数据抽样战略,以及一个未加标注的数据批量学习标准。由于TOD的简单性,我们的方法通过评估模型来估计抽样损失的样本损失程度,从而展示了我们最高级的测试方法,从而展示了我们所使用的最高级的测试方法。