ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end, the weak learners' outputs are combined into a weighted ensemble of classifiers. Three methods are proposed to manipulate the training set: undersampling, oversampling, and weighting the training samples according to the uncertainty estimated by the weak learners. Furthermore, two approaches are studied regarding the ensemble combination. The weak learner herein considered is a standard convolutional neural network, and the probabilistic models underlying the uncertainty estimation use either variational inference or Monte Carlo dropout. The experimental evaluation carried out on MNIST benchmark datasets shows that ProBoost yields a significant performance improvement. The results are further highlighted by assessing the relative achievable improvement, a metric proposed in this work, which shows that a model with only four weak learners leads to an improvement exceeding 12% in this metric (for either accuracy, sensitivity, or specificity), in comparison to the model learned without ProBoost.
翻译:在这项工作中,提议了一种用于概率分类器的新的推进算法ProBoost。这一算法使用每个培训样本的隐性不确定性来确定最具挑战性/不确定性的混合方法;然后这些样本的适切性对下一个较弱的学习者更加重要,从而产生一个序列,逐步侧重于发现具有最大不确定性的样本。最后,弱学习者的产出被合并成一个加权的分类器组合。提出了三种方法来操纵培训成套方法:低抽样、过度抽样和根据弱学习者估计的不确定性加权培训样本。此外,还研究了两种方法:共同组合。这里考虑的弱学习者是标准的共生神经网络,以及不确定性估计所依据的概率模型使用变数或蒙特卡洛的辍学。对MONIT基准数据集进行的实验评估表明,ProBoost得出了显著的业绩改进。通过评估相对可实现的改进情况,进一步突显了结果,这是对这项工作中所提出的一种衡量尺度,它表明,只有4个比较差的学习者才具有精确性,或者比得更差的模型更精确。