Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes' rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms should be considered together when estimating the probability of a classifier making a mistake for a given sample; i) probability of mislabelling a class, ii) likelihood of the data given a predicted class, and iii) the prior probability on the abundance of a predicted class. Implementing these terms requires a generative model and an intractable likelihood estimation. Therefore, we train a Variational Auto Encoder (VAE) for this purpose. To further tie the VAE with the classifier and facilitate VAE training, we use the classifiers' deep feature representations as input to the VAE. By considering all three probabilities, among them especially the data imbalance, we can substantially improve the potential of existing methods under limited data budget. We show that our method can be applied to classification tasks on multiple different datasets -- including one that is a real-world dataset with heavy data imbalance -- significantly outperforming the state of the art.
翻译:以单个样本为重点,对歧视模式的积极学习进行了广泛的研究,对单个样本的研究重点不那么强调类别分配方式或哪些类别难以处理。 在这项工作中,我们表明这是有害的。 我们提出一种基于贝斯规则的方法,可以自然地将阶级不平衡纳入积极学习框架。 我们得出,在估计分类者对某一样本造成错误的可能性时,三个术语应该一起考虑; 可能错贴一个类别, (二) 数据给预测类别的可能性, (三) 预测类别丰度的先前概率。 执行这些术语需要一种基因化模型和棘手的概率估计。 因此,我们为此培训一种挥发式自动编码器(VAE) 。 为了进一步将VAE与分类者挂钩,并为 VAE 培训提供便利, 我们使用分类者的深度特征描述作为VAE 输入。 通过考虑所有三种概率,特别是数据不平衡,我们可以大幅提高有限数据预算下现有方法的潜力。 我们表明,我们的方法可以应用到一个高度的分类任务上, 真实的数据结构中, 一种是真实的。