This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: decision-tree (fully transparent, interpretable, white-box model), a random forest (a semi-interpretable, grey-box model), and a neural network (a fully opaque, black-box model). We tested the counterfactual generation process using four algorithms (DiCE, WatcherCF, prototype, and GrowingSpheresCF) in the literature in five different datasets (COMPAS, Adult, German, Diabetes, and Breast Cancer). Our findings indicate that: (1) Different machine learning models have no impact on the generation of counterfactual explanations; (2) Counterfactual algorithms based uniquely on proximity loss functions are not actionable and will not provide meaningful explanations; (3) One cannot have meaningful evaluation results without guaranteeing plausibility in the counterfactual generation process. Algorithms that do not consider plausibility in their internal mechanisms will lead to biased and unreliable conclusions if evaluated with the current state-of-the-art metrics; (4) A qualitative analysis is strongly recommended (together with a quantitative analysis) to ensure a robust analysis of counterfactual explanations and the potential identification of biases.
翻译:这项研究调查了机器学习模型对反事实解释的生成的影响,对三种不同类型的模型进行了基准评价:(1) 决策树(完全透明、可解释、白箱模型)、随机森林(半解释、灰箱模型)和神经网络(完全不透明、黑箱模型),我们用五个不同数据集(COMPAS、成人、德国、糖尿病和乳腺癌)的文献中的四种算法(DICE、观察者CF、原型和GrowSphelesCF)对反事实解释进行了基准评价。 我们的调查结果表明:(1) 不同的机器学习模型对反事实解释的生成没有影响;(2) 以近距离损失功能为唯一基础的反事实算法无法采取行动,也不会提供有意义的解释;(3) 如果不能保证反事实生成过程中的可信赖性,就不可能有有意义的评价结果。