With the rapid development of eXplainable Artificial Intelligence (XAI), a long line of past work has shown concerns about the Out-of-Distribution (OOD) problem in perturbation-based post-hoc XAI models and explanations are socially misaligned. We explore the limitations of post-hoc explanation methods that use approximators to mimic the behavior of black-box models. Then we propose eXplanation-based Counterfactual Retraining (XCR), which extracts feature importance fastly. XCR applies the explanations generated by the XAI model as counterfactual input to retrain the black-box model to address OOD and social misalignment problems. Evaluation of popular image datasets shows that XCR can improve model performance when only retaining 12.5% of the most crucial features without changing the black-box model structure. Furthermore, the evaluation of the benchmark of corruption datasets shows that the XCR is very helpful for improving model robustness and positively impacts the calibration of OOD problems. Even though not calibrated in the validation set like some OOD calibration methods, the corrupted data metric outperforms existing methods. Our method also beats current OOD calibration methods on the OOD calibration metric if calibration on the validation set is applied.
翻译:在快速开发了可移植人工智能(XAI)之后,过去一长串的工作表明,人们担心在以扰动为基础的后XAI模型和解释中,在扰动式传播(OOOD)问题存在社会错误。我们探索了使用相近器模仿黑盒模型行为的热后解释方法的局限性。然后,我们提出了基于explanation的反反变再培训(XCR),快速提取了重要特征。XCR将XAI模型产生的解释作为反事实投入,用于重新引入黑盒模型,以解决OOD和社会不匹配问题。对流行图像数据集的评估表明,在只保留最关键特征的12.5%而不改变黑盒模型结构的情况下,XCR可以改进模型的性能。此外,对腐败数据集基准的评估表明,XCRCR对于改进模型的稳健性和对 OOD的校准问题产生积极影响。尽管在验证数据集中没有像OOOD校准校准方法那样校准黑箱模型,但如果在目前校准校准校准校准校准校准校准校准方法上,我们校准校正校正校正校正校正校正校正校正校正校正的校正方法也是我们校正校正校正的校正校正校正方法。