Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, CEs found by existing methods often become invalid when slight changes occur in the parameters of the model they were generated for. The literature lacks a way to provide exhaustive robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are mostly heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees for CEs under a possibly infinite set of plausible model changes $\Delta$. Based on this idea, we formalise a robustness notion for CEs, which we call $\Delta$-robustness, in both binary and multi-class classification settings. We present procedures to verify $\Delta$-robustness based on Mixed Integer Linear Programming, using which we further propose algorithms to generate CEs that are $\Delta$-robust. In an extensive empirical study involving neural networks and logistic regression models, we demonstrate the practical applicability of our approach. We discuss two strategies for determining the appropriate hyperparameters in our method, and we quantitatively benchmark CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.
翻译:暂无翻译