Model interpretability has become an important problem in machine learning (ML) due to the increased effect that algorithmic decisions have on humans. Counterfactual explanations can help users understand not only why ML models make certain decisions, but also how these decisions can be changed. We frame the problem of finding counterfactual explanations as a gradient-based optimization task and extend previous work that could only be applied to differentiable models. In order to accommodate non-differentiable models such as tree ensembles, we use probabilistic model approximations in the optimization framework. We introduce an approximation technique that is effective for finding counterfactual explanations for predictions of the original model and show that our counterfactual examples are significantly closer to the original instances than those produced by other methods specifically designed for tree ensembles.
翻译:由于算法决定对人类的影响越来越大,模型解释性已成为机器学习中的一个重要问题。反事实解释不仅有助于用户理解ML模型作出某些决定的原因,而且有助于如何改变这些决定。我们把寻找反事实解释的问题作为基于梯度的优化任务来看待,并将以前只能适用于不同模型的工作扩展至不同模型。为了适应诸如树群等非差别模型,我们在优化框架中采用概率模型近似法。我们采用了一种近似法,可以有效地找到对原始模型预测的反事实解释,并表明我们的反事实例子比为树群专门设计的其他方法所产生的例子更接近原始例子。