Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset -- most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY's superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.
翻译:积极学习是一种标签效率高的方法, 用于培训高效模型, 同时互动地选择少量未贴标签的数据子集, 用于标签和培训。 在“ 开放世界” 设置中, 感兴趣的类别可以构成整个数据集的一小部分 -- 大部分数据可以被视为分配外的或无关的类别。 这会导致极端类失衡, 我们的理论和方法侧重于这一核心问题。 我们提出了一个名为 GALAXY (基于格列的在 eXtrEme 上的积极学习) 的积极学习新战略, 该战略将基于图形的积极学习和深层次学习中的观点融合起来。 GALAXY 自动和适应性地选择了比大多数其他积极学习方法更均衡的分类范例。 我们的理论显示, GALAXY 将精细的不确定性取样形式收集比香草不确定性取样更加平衡的数据。 我们实验性地展示了 GALAXY 相对于从流行数据集生成的不平衡的视觉分类设置中的现有状态深层次学习算法的优越性。