Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
翻译:积极学习的目的是选择能够给学习算法带来最大性能改进的样本,以附加说明的方式选择能够给学习算法带来最大性能改进的样本。 许多方法通过测量样本信息度来解决这一问题,并且根据对样本网络预测的确定性来这样做。 但是,众所周知,神经网络对其预测过于自信,因此对于评估样本信息度来说是一个不可信的来源。 在本文中,我们提出了一个基于信息度的新的积极学习方法。 我们的衡量标准来自神经网络的学习动态。 我们更准确地跟踪在算法培训中未加标签的数据库的标签分配情况。 我们用一个称为标签分散的参数来捕捉学习动态。 当网络在网络培训期间始终为样本分配同样的标签时,这种动态是低的,而当指定的标签变化频繁时,则很高。 我们表明,标签分散是网络不确定性的一个很有希望的预测,并在两个基准数据集中显示,基于标签分散度的积极学习算法获得极好的结果。