Explainable AI offers insights into what factors drive a certain prediction of a black-box AI system. One popular interpreting approach is through counterfactual explanations, which go beyond why a system arrives at a certain decision to further provide suggestions on what a user can do to alter the outcome. A counterfactual example must be satisfy various constraints to be useful for real-world applications. These constraints exist at trade-offs between one and another presenting radical challenges to existing works. To this end, we propose a stochastic learning-based framework that effectively balances the counterfactual trade-offs. The framework consists of a generation and a feature selection module with complementary roles: the former aims to model the distribution of valid counterfactuals whereas the latter serves to enforce additional constraints in a way that allows for differentiable training and amortized optimization. We demonstrate the effectiveness of our method in generating actionable and plausible counterfactuals that are more diverse than the existing methods and particularly more efficient than closest baselines.
翻译:AI 提供了一些可以解释的洞察力,说明是什么因素驱使了对一个黑盒子AI系统的某种预测。一种流行的解释方法是反事实解释,这超出了一个系统作出进一步建议一个用户可以改变结果的某种决定的原因。一个反事实的例子必须满足各种限制,才能对现实世界的应用有用。这些限制存在于相互权衡时,对现有的工程提出了巨大的挑战。为此,我们提议了一个以随机取舍为基础的框架,有效地平衡反事实权衡。这个框架由一代人和一个具有补充作用的特征选择模块组成:前者的目的是模拟有效的反事实的分布,而后者则是为了以允许不同培训和摊销优化的方式执行额外的限制。我们展示了我们的方法在产生比现有方法更多样化的可诉和可信的反事实方面的有效性,特别是比最接近的基线更有效率。