Many classification models produce a probability distribution as the outcome of a prediction. This information is generally compressed down to the single class with the highest associated probability. In this paper, we argue that part of the information that is discarded in this process can be in fact used to further evaluate the goodness of models, and in particular the confidence with which each prediction is made. As an application of the ideas presented in this paper, we provide a theoretical explanation of a confidence degradation phenomenon observed in the complement approach to the (Bernoulli) Naive Bayes generative model.
翻译:许多分类模型产生概率分布,作为预测的结果。这种信息一般压缩到单一类别,相关概率最高。在本文中,我们认为,在这一过程中被抛弃的部分信息实际上可用于进一步评估模型的优劣性,特别是每项预测的可靠性。作为本文件提出的想法的应用,我们从理论上解释了在(Bernoulli)Naive Bayes基因模型补充方法中观察到的信任退化现象。