Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, a smaller, therefore more interpretable Decision Tree can be trained, which, in addition, enhances the stability and robustness of the baseline Decision Tree. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity compared to the baseline Decision Tree.
翻译:反事实解释正在成为热量后可解释机器学习中的一种非事实标准。 对于特定分类器和被分类为不理想类别的一个实例来说,其反事实解释相当于该实例的小扰动,从而可以改变分类结果。这项工作旨在利用反事实解释来探测预先训练的黑盒模型的重要决定界限。这一信息被用来用金枪鱼颗粒来建立数据集特性的受监督的分解。使用离散数据集,可以培训一个较小、因此更易解释的“决定树”,这还能够增强基线决定树的稳定性和稳健性。真实世界数据集的量化结果表明,与基线决定树相比,该方法在准确性和宽度方面是有效的。