While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion that enhances model performance by incorporating the unlabeled data. Due to the simplicity of TOD, our active learning approach is efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
翻译:虽然深层学习在一系列广泛的任务中取得成功,但高度依赖大量收集昂贵和耗时的附加说明的数据。为了降低数据注释的成本,建议积极学习以互动的方式询问一个孔孔,在未贴标签的数据集中说明一小部分信息样本。受损失较多的样本通常比损失较少的样本对模型更具有信息意义这一事实的启发,在本文件中,我们提出了一个全新的深层积极学习方法,在未贴标签的样本被认为包含高损失时,询问数据注释的奥秘。我们的方法的核心是测量时间输出不一致性(TOD),通过评估不同优化步骤模型提供的产出差异来估计抽样损失。我们的理论调查显示,由于TOD对累积的样本损失进行了较低的限制,因此可以用来选择信息性、无标签的样本样本。根据TOD,我们进一步制定了有效的未贴标签的数据取样战略,以及一个未经校准的学习标准,通过纳入未贴标签的数据来提高模型的性能。由于对模型进行简单化的测试,我们积极学习的方法比实验性强,因此,我们积极学习的学习方式是高端的。