Active learning aims to optimize the dataset annotation process when resources are constrained. Most existing methods are designed for balanced datasets. Their practical applicability is limited by the fact that a majority of real-life datasets are actually imbalanced. Here, we introduce a new active learning method which is designed for imbalanced datasets. It favors samples likely to be in minority classes so as to reduce the imbalance of the labeled subset and create a better representation for these classes. We also compare two training schemes for active learning: (1) the one commonly deployed in deep active learning using model fine tuning for each iteration and (2) a scheme which is inspired by transfer learning and exploits generic pre-trained models and train shallow classifiers for each iteration. Evaluation is run with three imbalanced datasets. Results show that the proposed active learning method outperforms competitive baselines. Equally interesting, they also indicate that the transfer learning training scheme outperforms model fine tuning if features are transferable from the generic dataset to the unlabeled one. This last result is surprising and should encourage the community to explore the design of deep active learning methods.
翻译:积极学习的目的是在资源受限时优化数据集注释过程。 多数现有方法是为平衡数据集设计的。 它们的实际适用性有限, 原因是大多数实际存在的数据集实际上不平衡。 在这里, 我们引入了一种新的主动学习方法, 是为不平衡的数据集设计的。 它有利于可能属于少数类的样本, 以减少标签子集的不平衡, 并为这些类创造更好的代表性。 我们还比较了两种积极学习培训计划:(1) 通常在深层积极学习中采用的方法, 使用对每个迭代的微调模式进行微调; (2) 一种由转移学习所启发的计划, 并开发通用的预培训模型, 以及培训每个迭代的浅层分类器。 评估用三种不平衡的数据集进行。 结果显示, 拟议的积极学习方法比竞争性基线要强。 同样有趣的是, 它们也表明, 传输学习培训计划比模型的微调差, 如果特性从通用数据集转移到未标的数据集。 最后的结果是惊人的, 并且应该鼓励社区探索深层积极的学习方法的设计 。