Deep active learning aims to reduce the annotation cost for the training of deep models, which is notoriously data-hungry. Until recently, deep active learning methods were ineffectual in the low-budget regime, where only a small number of examples are annotated. The situation has been alleviated by recent advances in representation and self-supervised learning, which impart the geometry of the data representation with rich information about the points. Taking advantage of this progress, we study the problem of subset selection for annotation through a "covering" lens, proposing ProbCover - a new active learning algorithm for the low budget regime, which seeks to maximize Probability Coverage. We then describe a dual way to view the proposed formulation, from which one can derive strategies suitable for the high budget regime of active learning, related to existing methods like Coreset. We conclude with extensive experiments, evaluating ProbCover in the low-budget regime. We show that our principled active learning strategy improves the state-of-the-art in the low-budget regime in several image recognition benchmarks. This method is especially beneficial in the semi-supervised setting, allowing state-of-the-art semi-supervised methods to match the performance of fully supervised methods, while using much fewer labels nonetheless. Code is available at https://github.com/avihu111/TypiClust.
翻译:深层积极学习旨在降低深层模型培训的批注成本,这是众所周知的数据饥饿现象。直到最近,深层积极学习方法在低预算制度中是无效的,在低预算制度中只有为数不多的例子可以附加说明。最近的代表性和自我监督学习方面的进步缓解了这一状况,这些进展使数据代表的几何结构与关于这些点的丰富信息有了丰富的信息。利用这一进展,我们研究了通过“覆盖”透镜为批注选择子子集的问题,提出了ProbCover——低预算制度的新的积极学习算法,寻求最大限度地扩大概率覆盖。我们然后描述了一种双管齐下的方法来查看拟议的提法,从中可以产生适合高预算积极学习制度的战略,与核心系统等现有方法相关。我们以广泛的实验结束,对低预算制度中的ProbCover进行了评估。我们展示了我们的原则性积极学习战略在若干图像识别基准中改进了低预算制度中的状态。这种方法在半监督的设置中特别有益,同时允许使用低监管的半监督方法,同时允许使用低监管的MAC/Com-r-com 方法。