Currently, there is a significant amount of research being conducted in the field of artificial intelligence to improve the explainability and interpretability of deep learning models. It is found that if end-users understand the reason for the production of some output, it is easier to trust the system. Recommender systems are one example of systems that great efforts have been conducted to make their output more explainable. One method for producing a more explainable output is using counterfactual reasoning, which involves altering minimal features to generate a counterfactual item that results in changing the output of the system. This process allows the identification of input features that have a significant impact on the desired output, leading to effective explanations. In this paper, we present a method for generating counterfactual explanations for both tabular and textual features. We evaluated the performance of our proposed method on three real-world datasets and demonstrated a +5\% improvement on finding effective features (based on model-based measures) compared to the baseline method.
翻译:目前,人工智能领域的研究者正在大力开展工作,以改进深度学习模型的可解释性和可解释性。研究发现,如果最终用户了解某些输出的产生原因,就更容易信任该系统。推荐系统是这样的一个例子,已经进行了大量的工作以使其输出更具可解释性。生成更可解释的输出方法之一是使用反事实推理,这涉及到改变最小的特征以生成导致系统输出变化的反事实项。该方法允许识别对期望输出产生显着影响的输入特征,从而实现有效的解释。本文提出了一种方法,用于生成表格和文本特征的反事实解释。我们在三个实际数据集上评估了我们提出的方法的性能,并证明与基线方法相比,通过基于模型的指标找到有效特征上有了+5%的改进。