Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of CEs: robustness to adverse perturbations, which may naturally happen due to unfortunate circumstances. Since CEs typically prescribe a sparse form of intervention (i.e., only a subset of the features should be changed), we study the effect of addressing robustness separately for the features that are recommended to be changed and those that are not. Our definitions are workable in that they can be incorporated as penalty terms in the loss functions that are used for discovering CEs. To experiment with robustness, we create and release code where five data sets (commonly used in the field of fair and explainable machine learning) have been enriched with feature-specific annotations that can be used to sample meaningful perturbations. Our experiments show that CEs are often not robust and, if adverse perturbations take place (even if not worst-case), the intervention they prescribe may require a much larger cost than anticipated, or even become impossible. However, accounting for robustness in the search process, which can be done rather easily, allows discovering robust CEs systematically. Robust CEs make additional intervention to contrast perturbations much less costly than non-robust CEs. We also find that robustness is easier to achieve for the features to change, posing an important point of consideration for the choice of what counterfactual explanation is best for the user. Our code is available at: https://github.com/marcovirgolin/robust-counterfactuals.
翻译:反事实解释( CEs) 是理解算法如何改变决策的有力手段。 研究人员已经提出了一系列关于CE应该满足的假设, 认为CE应该达到的实际效果是实际有用的, 例如需要最低限度的努力来颁布, 或者遵守因果关系模型。 我们考虑另一个方面来提高 CEs 的可用性: 对不利扰动的稳健性, 这可能自然发生于不幸的情况。 由于 CEs 通常会规定一种稀疏的干预形式( 也就是, 只需改变较易变的特性的一部分 ), 我们研究分别解决强性的效果, 以建议改变的和不改变的特性。 我们的定义是可行的, 因为它们可以作为惩罚性术语纳入用于发现 CEE 的亏损功能。 要进行稳健性试验, 我们创建并发布代码, 5个数据集( 通常在公平和可解释的机器学习领域使用) 已经丰富了具体特性的描述, 用来对稳健性干预的意义比易变, 我们的实验显示CE往往不够稳健, 如果坏的C- 扭曲性解释会变得更糟糕, 我们的计算。