Reliable probability estimation is of crucial importance in many real-world applications where there is inherent uncertainty, such as weather forecasting, medical prognosis, or collision avoidance in autonomous vehicles. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the important difference that the objective is to estimate probabilities rather than predicting the specific outcome. The goal of this work is to investigate probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on classification problems where the probabilities are related to model uncertainty. In the case of problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. Finally, we also propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data.
翻译:可靠概率估算对于许多存在内在不确定性的现实应用,如天气预报、医学预测或自动车辆避免碰撞等,具有至关重要的意义。概率估测模型在观测结果(例如,是否下雨,病人是否死亡)方面受过培训,因为人们通常不知道感兴趣的事件的地面概率。因此,问题与二进制分类相似,其重要区别在于,目标是估算概率而不是预测具体结果。这项工作的目标是利用深层神经网络调查从高维数据中得出的概率估算。存在几种方法来改进这些模型产生的概率,但它们主要侧重于与模型不确定性相关的分类问题。在存在固有的不确定性的情况下,评估业绩是困难的。为了解决这个问题,我们建立一个合成数据集,用来研究并比较不同的可比较指标。我们用最深层神经性的数据来调查高维度数据。我们评估现有方法,从三个真实-世界的概率模型中推导出准确性数据,从真实-概率估算中推导出一个精确性数据,从最后的概率估算,从真实-世界的模型中推导出一个精确性的数据,从真实性模型到最新的预测,从真实性模型中推导出一个精确性数据。