With the goal of making deep learning more label-efficient, a growing number of papers have been studying active learning (AL) for deep models. However, there are a number of issues in the prevalent experimental settings, mainly stemming from a lack of unified implementation and benchmarking. Issues in the current literature include sometimes contradictory observations on the performance of different AL algorithms, unintended exclusion of important generalization approaches such as data augmentation and SGD for optimization, a lack of study of evaluation facets like the labeling efficiency of AL, and little or no clarity on the scenarios in which AL outperforms random sampling (RS). In this work, we present a unified re-implementation of state-of-the-art AL algorithms in the context of image classification via our new open-source AL toolkit DISTIL, and we carefully study these issues as facets of effective evaluation. On the positive side, we show that AL techniques are $2\times$ to $4\times$ more label-efficient compared to RS with the use of data augmentation. Surprisingly, when data augmentation is included, there is no longer a consistent gain in using BADGE, a state-of-the-art approach, over simple uncertainty sampling. We then do a careful analysis of how existing approaches perform with varying amounts of redundancy and number of examples per class. Finally, we provide several insights for AL practitioners to consider in future work, such as the effect of the AL batch size, the effect of initialization, the importance of retraining the model at every round, and other insights.
翻译:由于要提高标签效率,越来越多的论文正在研究深层次模型的积极学习(AL),然而,在普遍存在的实验环境中存在一些问题,主要是缺乏统一的执行和基准;目前文献中的问题有时包括对不同AL算法的性能有相互矛盾的观察,无意中排除了诸如数据增强和优化SGD等重要的概括化方法,缺乏对AL标签效率等评价方面的研究,而且很少或没有明确说明AL超标随机抽样(RS)的情景。在这项工作中,我们通过新的开放源码AL DISTIL在图像分类方面统一重新实施最新AL算法,我们仔细研究这些问题,将其作为有效评价的方面。在积极的一面,我们显示AL技术比RS的标签效率高2美元到4美元,在使用数据增强模型时,令人惊讶的是,当纳入AL的周期性抽样时,在使用BADGE、最初的成熟的AL算法时,我们用新的水平来对AL的精确度方法进行一致的重新实施,在每一层次上,我们用LI的精确性分析,在每一层次上,我们为AL的精确的细估量的常规分析,在每一层次分析中,我们为AL的细的细的细的细分析提供。