Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.
翻译:反事实解释在发现偏差和改进数据驱动分类模型的解释性方面发挥了重要作用。反事实解释(CE)是一个极小的扰动数据点,模型决定对此有变化。大多数现有方法只能提供一个CE,用户可能无法实现。在这项工作中,我们得出了一个迭接方法来计算强大的 CE,即即使在特征稍受扰动后仍然有效的CE。为此,我们的方法提供了整个CE区域,使用户能够选择合适的途径获得理想结果。我们从强力优化中采用算法理念,并证明最常用的机器学习方法的趋同结果,包括后勤回归、决策树、随机森林和神经网络。我们的实验表明,我们的方法可以有效地产生全球最佳的CE,用于各种共同数据集和分类模型。