Predictive uncertainty estimation is an essential next step for the reliable deployment of deep object detectors in safety-critical tasks. In this work, we focus on estimating predictive distributions for bounding box regression output with variance networks. We show that in the context of object detection, training variance networks with negative log likelihood (NLL) can lead to high entropy predictive distributions regardless of the correctness of the output mean. We propose to use the energy score as a non-local proper scoring rule and find that when used for training, the energy score leads to better calibrated and lower entropy predictive distributions than NLL. We also address the widespread use of non-proper scoring metrics for evaluating predictive distributions from deep object detectors by proposing an alternate evaluation approach founded on proper scoring rules. Using the proposed evaluation tools, we show that although variance networks can be used to produce high quality predictive distributions, ad-hoc approaches used by seminal object detectors for choosing regression targets during training do not provide wide enough data support for reliable variance learning. We hope that our work helps shift evaluation in probabilistic object detection to better align with predictive uncertainty evaluation in other machine learning domains. Code for all models, evaluation, and datasets is available at: https://github.com/asharakeh/probdet.git.
翻译:在这项工作中,我们侧重于估算用于与差异网络结合的盒状回归输出的预测分布;我们表明,在物体探测方面,对负日志概率(NLL)的反差网络的培训可导致高原预测分布,而不论产出值的正确性如何。我们提议使用能量评分作为非本地适当评分规则,并发现在培训中,能源评分导致比NLL更好地校准和降低恒温预测分布。我们还注重广泛使用非丙基评分衡量标准来评价深层天体探测器的预测分布。我们提出一种基于正确评分规则的替代评价方法。我们使用拟议的评价工具表明,尽管差异网络可用来产生高质量的预测分布,但半量性天体探测器用于在培训中选择回归目标的附加方法并不能为可靠的差异学习提供足够广泛的数据支持。我们希望我们的工作有助于将预测性天体探测的评分转移到更符合预测性不确定性的模型:在适当的评分/其他机域中进行。