While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.
翻译:机译摘要:虽然先进的分类器在实际世界中的安全关键应用中被越来越多地使用,但如何根据特定的人类价值观正确评估黑匣子模型仍然是该界的一个关注点。这样的人类价值包括惩罚不同严重程度的错误案例并在不同程度上减少普遍性能以减少特定危险情况。在本文中,我们提出了一种基于概率预测的抽象表示和可调阈值的新型评估方法,称为 Meta 模式关注度评分,旨在向多分类器中引入人类价值观。从技术上讲,我们从两种常见度量方法,即混淆矩阵评估方法和损失值的优点和缺点中学习,使我们的方法即使在一般任务下也是有效的,交叉熵损失成为我们的方法在极限情况下的特例。此外,我们的方法还可以用于通过动态调整学习率来优化模型训练。在四种模型和六个数据集上的实验验证了我们方法的有效性和效率。通过病例研究,我们发现它不仅可以找到理想的模型,在只牺牲 0.04% 的训练准确性的情况下,减少了 0.53% 的危险情况,而且还可以通过调整学习率来训练新模型,新模型平均优于原始模型,自身的值较低了1.62%,危险情况的数量减少了 0.36%。