Being uncertain when facing the unknown is key to intelligent decision making. However, machine learning algorithms lack reliable estimates about their predictive uncertainty. This leads to wrong and overly-confident decisions when encountering classes unseen during training. Despite the importance of equipping classifiers with uncertainty estimates ready for the real world, prior work has focused on small datasets and little or no class discrepancy between training and testing data. To close this gap, we introduce UIMNET: a realistic, ImageNet-scale test-bed to evaluate predictive uncertainty estimates for deep image classifiers. Our benchmark provides implementations of eight state-of-the-art algorithms, six uncertainty measures, four in-domain metrics, three out-domain metrics, and a fully automated pipeline to train, calibrate, ensemble, select, and evaluate models. Our test-bed is open-source and all of our results are reproducible from a fixed commit in our repository. Adding new datasets, algorithms, measures, or metrics is a matter of a few lines of code-in so hoping that UIMNET becomes a stepping stone towards realistic, rigorous, and reproducible research in uncertainty estimation. Our results show that ensembles of ERM classifiers as well as single MIMO classifiers are the two best alternatives currently available to measure uncertainty about both in-domain and out-domain classes.
翻译:然而,机器学习算法缺乏关于预测不确定性的可靠估计。这导致在培训期间遇到看不见的班级时作出错误和过于自信的决定。尽管有必要为分类者配备为现实世界准备的不确定性估计,但先前的工作侧重于小数据集,培训和测试数据之间很少或没有阶级差异。为了缩小这一差距,我们引入了UIMNET:一个现实的、图像网络规模的测试床,用以评价深层图像分类者的预测不确定性估计。我们的基准提供了八个最先进的算法、六个不确定性计量、四个内部计量标准、三个局外计量标准以及培训、校准、共同、选择和评价模型的完全自动化管道。我们的测试台是开放的源头,我们的所有结果都可以从我们储存库的固定承诺中复制出来。添加新的数据集、算法、计量或计量是少数几行代码的问题,希望UIMNET成为现实、严格、计量指标、三个局外计量指标和完全自动化的管道,用于培训、校正、严格、选择和评价模型的管道。我们的测试床是开放源,所有结果都可以从我们的固定承诺中复制的两层数据,作为我们目前最精确的分类的模型的模型的分类,显示最佳的不确定性,作为目前最佳的分类的模型的模型的模型的可靠结果。