It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.
翻译:众所周知,准确的概率预测器可以通过经验风险最小化来培训,并有适当的评分规则作为损失功能。这些学习者捕捉到所谓的预测的偏差不确定性,但最近开发了各种机器学习方法,目的是使学习者也代表其隐性不确定性,即缺乏知识和数据造成的不确定性。文献的一个新兴分支提议使用第二阶学习者,根据概率分布的分布提供预测。然而,最近的工作揭示了基于损失最小化的第二阶预测者的严重理论缺陷。我们在本论文中将这些结论加以概括,并证明一个更根本的结果:似乎没有任何损失功能能激励第二阶学习者忠实地以与标准(一阶)学习者的恰当评分规则相同的方式代表其隐性不确定性。作为证明这一结果的主要数学工具,我们引入了第二阶评分规则的普遍概念。