Feature attribution is widely used in interpretable machine learning to explain how influential each measured input feature value is for an output inference. However, measurements can be uncertain, and it is unclear how the awareness of input uncertainty can affect the trust in explanations. We propose and study two approaches to help users to manage their perception of uncertainty in a model explanation: 1) transparently show uncertainty in feature attributions to allow users to reflect on, and 2) suppress attribution to features with uncertain measurements and shift attribution to other features by regularizing with an uncertainty penalty. Through simulation experiments, qualitative interviews, and quantitative user evaluations, we identified the benefits of moderately suppressing attribution uncertainty, and concerns regarding showing attribution uncertainty. This work adds to the understanding of handling and communicating uncertainty for model interpretability.
翻译:在可解释的机器学习中广泛使用特性归属来解释每个计量输入特性值对产出推导的影响力。然而,测量可能不确定,对投入不确定性的认识如何影响解释的信任还不清楚。我们提出并研究两种办法,帮助用户在示范解释中管理其对不确定性的看法:(1) 透明地显示特性属性的不确定性,以便用户能够思考;(2) 制止对具有不确定测量特性的特性的归属,并通过对不确定性的处罚进行规范化,将归属归属转移到其他特征。通过模拟实验、定性访谈和定量用户评价,我们确定了适度抑制属性不确定性的好处,以及显示归属不确定性的关切。这项工作增加了对处理和传递模型解释不确定性的理解。