As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are challenging for the model, it does not capture the underlying source of the uncertainty. In this work, we seek to identify examples the model is uncertain about and characterize the source of said uncertainty. We explore the benefits of designing a targeted intervention - targeted data augmentation of the examples where the model is uncertain over the course of training. We investigate whether the rate of learning in the presence of additional information differs between atypical and noisy examples? Our results show that this is indeed the case, suggesting that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty.
翻译:由于越来越多地使用机器学习模型来协助人类决策者,因此交流与这些模型预测有关的不确定性变得至关重要;然而,关于不确定性的大多数工作侧重于传统的概率或排名方法,即模型给不确定的例子分配概率低或分数低;这抓住了哪些实例对模型具有挑战性,但没有抓住不确定性的根本原因;在这项工作中,我们力求找出模型不确定的范例,并描述不确定性的来源;我们探讨了设计有针对性的干预措施的好处——在培训过程中模型不确定的示例中增加有针对性的数据。我们调查在有额外信息的情况下学习率在非典型和吵闹的例子之间是否有所不同?我们的结果显示,情况确实如此,表明在培训过程中设计完善的干预措施能够有效地辨别和区分不同的不确定性来源。