Counterfactual explanations (CEs) are a practical tool for demonstrating why machine learning classifiers make particular decisions. For CEs to be useful, it is important that they are easy for users to interpret. Existing methods for generating interpretable CEs rely on auxiliary generative models, which may not be suitable for complex datasets, and incur engineering overhead. We introduce a simple and fast method for generating interpretable CEs in a white-box setting without an auxiliary model, by using the predictive uncertainty of the classifier. Our experiments show that our proposed algorithm generates more interpretable CEs, according to IM1 scores, than existing methods. Additionally, our approach allows us to estimate the uncertainty of a CE, which may be important in safety-critical applications, such as those in the medical domain.
翻译:反事实解释(CEs)是表明机器学习分类师为何做出特定决定的实用工具。 要使CEs有用,重要的是它们对于用户容易解释。 现有的可解释 CEs 生成方法依赖于辅助基因模型,这些模型可能不适合复杂的数据集,并产生工程间接费用。 我们采用简单快捷的方法,利用分类师的预测不确定性,在没有辅助模型的情况下在白箱设置中生成可解释的CE。 我们的实验表明,根据IM1的评分,我们提议的算法产生比现有方法更便于解释的CE。 此外,我们的方法使我们能够估计CE的不确定性,这可能对安全关键应用,例如医疗领域的应用很重要。