Uncertainty quantification in neural network promises to increase safety of AI systems, but it is not clear how performance might vary with the training set size. In this paper we evaluate seven uncertainty methods on Fashion MNIST and CIFAR10, as we sub-sample and produce varied training set sizes. We find that calibration error and out of distribution detection performance strongly depend on the training set size, with most methods being miscalibrated on the test set with small training sets. Gradient-based methods seem to poorly estimate epistemic uncertainty and are the most affected by training set size. We expect our results can guide future research into uncertainty quantification and help practitioners select methods based on their particular available data.
翻译:神经网络的不确定性量化可能提高AI系统的安全性,但尚不清楚培训规模的性能会如何变化。在本文件中,我们评估了时装MNIST和CIFAR10的七种不确定方法,因为我们分样并制作了不同的培训尺寸。我们发现校准错误和分布检测性能的错误在很大程度上取决于培训的尺寸,大多数方法在测试组上以小型培训组合进行校准。基于渐进法的方法似乎没有很好地估计成瘾的不确定性,而且受培训规模的影响最大。我们期望我们的成果能够指导未来关于不确定性量化的研究,并帮助从业人员根据他们现有的特定数据选择方法。