Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust -- a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy -- an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.
翻译:调查主动性学习,我们侧重于标签实例数量(预算规模)和适当查询战略之间的关系。我们的理论分析显示,在阶段过渡中,行为会回溯到某种行为:典型的例子最好在预算低时查询,而在预算大时则最好查询无代表性的例子。综合证据表明,共同分类模式中也出现类似现象。因此,我们提议TypiClust -- -- 一种适合低预算的深度积极学习战略。在对监督学习进行比较性经验调查中,TypiClust利用各种结构和图像数据集,超越了低预算制度中所有其他积极的学习战略。在半监督框架内使用TypiClust,业绩得到更大的推动。特别是,在CIFAR-10培训了最先进的半监督方法,TypiClust选择了10个有标签的例子,达到93.2%的精确度 -- -- 随机选择提高了39.4%的准确度。代码见https://github.com/avihu/111/Typilust。