With the goal of making deep learning more label-efficient, a growing number of papers have been studying active learning (AL) for deep models. However, there are a number of issues in the prevalent experimental settings, mainly stemming from a lack of unified implementation and benchmarking. Issues in the current literature include sometimes contradictory observations on the performance of different AL algorithms, unintended exclusion of important generalization approaches such as data augmentation and SGD for optimization, a lack of study of evaluation facets like the labeling efficiency of AL, and little or no clarity on the scenarios in which AL outperforms random sampling (RS). In this work, we present a unified re-implementation of state-of-the-art AL algorithms in the context of image classification, and we carefully study these issues as facets of effective evaluation. On the positive side, we show that AL techniques are 2x to 4x more label-efficient compared to RS with the use of data augmentation. Surprisingly, when data augmentation is included, there is no longer a consistent gain in using BADGE, a state-of-the-art approach, over simple uncertainty sampling. We then do a careful analysis of how existing approaches perform with varying amounts of redundancy and number of examples per class. Finally, we provide several insights for AL practitioners to consider in future work, such as the effect of the AL batch size, the effect of initialization, the importance of retraining a new model at every round, and other insights.
翻译:由于要提高标签效率,越来越多的论文正在研究深层次模型的积极学习(AL),然而,在普遍存在的实验环境中存在若干问题,主要是缺乏统一的执行和基准;目前文献中的问题有时包括对不同AL算法的绩效有相互矛盾的观察,无意中排除了数据扩增和优化SGD等重要的概括化方法,缺乏对AL标签效率等评价方面的研究,很少或没有明确说明AL超标随机抽样的情景。在这项工作中,我们提出了在图像分类方面统一重新实施最新AL算法的问题,我们仔细研究这些问题,将其作为有效评价的方面。在积极的一面,我们表明AL技术比RS在使用数据扩增方面更有标签效率2x至4x更高的标签效率。令人惊讶的是,在包括AL标签效率的情况下,对AL的扩大等评价方面缺乏研究,在使用ABADGE、一种最新方法、超越简单不确定性抽样的情景方面没有取得一致的收益。我们随后仔细分析AL现有方法如何对未来规模进行不同程度的重新定性和再分析,以便逐级地分析。