Existing active learning studies typically work in the closed-set setting by assuming that all data examples to be labeled are drawn from known classes. However, in real annotation tasks, the unlabeled data usually contains a large amount of examples from unknown classes, resulting in the failure of most active learning methods. To tackle this open-set annotation (OSA) problem, we propose a new active learning framework called LfOSA, which boosts the classification performance with an effective sampling strategy to precisely detect examples from known classes for annotation. The LfOSA framework introduces an auxiliary network to model the per-example max activation value (MAV) distribution with a Gaussian Mixture Model, which can dynamically select the examples with highest probability from known classes in the unlabeled set. Moreover, by reducing the temperature $T$ of the loss function, the detection model will be further optimized by exploiting both known and unknown supervision. The experimental results show that the proposed method can significantly improve the selection quality of known classes, and achieve higher classification accuracy with lower annotation cost than state-of-the-art active learning methods. To the best of our knowledge, this is the first work of active learning for open-set annotation.
翻译:现有积极学习研究通常在封闭式设置环境中发挥作用,假设所有贴上标签的数据示例都来自已知的类别。然而,在真实的批注任务中,未贴标签的数据通常包含大量未知类别的例子,导致大多数积极学习方法的失败。为了解决这一开放式批注(OSA)问题,我们提议一个新的主动学习框架,称为LfOSA,它通过有效的抽样战略来提高分类性能,以精确地检测已知类别的说明实例。LfOSA框架引入了一个辅助网络,以模型模拟与高分解混合模型的每例最大激活值(MAV)分布,该模型能够动态地从未贴标签的已知类别中以最高概率选择示例。此外,通过降低损失函数的温度$T,检测模型将进一步优化,同时利用已知和未知的监管。实验结果显示,拟议的方法可以大大改进已知类别的选择质量,并实现更高的分类准确度,其注释成本要低于状态式主动学习方法。这是我们最先进的知识,这是用来学习的。