Explainable AI offers insights into what factors drive a certain prediction of a black-box AI system. One popular interpreting approach is through counterfactual explanations, which go beyond why a system arrives at a certain decision to further provide suggestions on what a user can do to alter the original outcome. A counterfactual example must possess plenty of desiderata. These constraints exist at trade-offs between one and another presenting radical challenges to existing works. We here propose a stochastic learning-based framework that effectively balances the counterfactual trade-offs. The framework consists of a generation and a feature selection module with complementary roles: the former aims to model the distribution of valid counterfactuals whereas the latter serves to enforce additional constraints in a way that allows for differentiable training and amortized optimization. We demonstrate the effectiveness of our method in generating actionable and plausible counterfactuals that are more diverse than the existing methods and particularly more efficient than closest baselines.
翻译:AI 提供了对什么因素驱使对黑盒 AI 系统进行某种预测的深刻见解。 一种流行的解释方法是反事实解释,这超出了一个系统作出某种决定的原因,即进一步就用户可以如何改变原始结果提出建议的理由。 反事实的例子必须拥有大量的分流。 这些制约存在于对现有工程构成根本挑战的权衡上。 我们在此提出一个能够有效平衡反事实权衡的随机学习框架。 框架包括一代和具有补充作用的特征选择模块:前者的目的是模拟有效反事实的分布,而后者则是为了以允许不同培训和摊销优化的方式执行额外的限制。 我们展示了我们的方法在产生比现有方法更多样化的、特别是比最接近的基线更有效率的可操作和可信的反事实方面的有效性。