Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Several heuristic and optimal methods have been proposed to generate these explanations. However, the robustness of counterfactual explanations when the classifier is re-trained has yet to be studied. Our goal is to obtain counterfactual explanations for random forests that are robust to algorithmic uncertainty. We study the link between the robustness of ensemble models and the robustness of base learners and frame the generation of robust counterfactual explanations as a chance-constrained optimization problem. We develop a practical method with good empirical performance and provide finite-sample and asymptotic guarantees for simple random forests of stumps. We show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below $50\%$ on most data sets and can fall to $20\%$ on large problem instances with many features. Even with high plausibility, counterfactual explanations often exhibit low robustness to algorithmic uncertainty. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations. Furthermore, we highlight the connection between the robustness of counterfactual explanations and the predictive importance of features.
翻译:反事实解释 描述如何修改特性矢量以翻转受过训练的分类师的结果 。 已经提出了几种灵敏和最佳的方法来产生这些解释 。 但是, 当分类师经过再培训时反事实解释的可靠性还有待研究 。 我们的目标是为随机森林获得对算法不确定性具有强力的反事实解释 。 我们研究混合模型的稳健性和基础学习者的强健性之间的联系,并将强健的反事实解释作为一种受机会限制的优化问题加以框架 。 我们开发了一种具有良好经验性的实际方法,并为简单的随机树桩林提供了有限和无药性保障 。 我们显示,现有方法的稳健性惊人地低:大多数数据集中的天真反事实的有效性低于50美元,而且在许多具有许多特点的大问题上可以降到20美元。 即使高尚,反事实性解释往往表明对算不确定性的稳健性。 相比之下,我们的方法在从反事实解释到初步预测的准确性特征之间的距离上只小幅提高。