Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet. SIMILAR is available as a part of the DISTIL toolkit: "https://github.com/decile-team/distil".
翻译:积极学习证明有助于通过选择信息最丰富的样本来尽量减少标签成本。 但是,现有的积极学习方法在现实情况下,例如不平衡或稀有的类别、未贴标签的数据集的分发数据以及冗余等现实情景下效果不佳。 在这项工作中,我们提出SIMILAR(基于初级信息措施的基点为ActionIve LeARning),这是一个统一的积极学习框架,使用最近提议的亚模式信息措施(SIMIM)作为获取功能。我们争辩说,SIMILAR不仅在标准积极学习中发挥作用,而且容易扩展到上述现实环境,并且作为积极学习的一站式解决办法,可以伸缩到大型真实世界数据集。我们简单地表明,SIMILAR大大超越了现有的主动学习算法,在稀有的类中,高达5%至18%,在诸如CIFAR-10、MNIST和图像网络等若干图像分类任务中,即为MNILAR/decIGIG:http://gisambrocom。