反事实解释作为在暗地空间的干预 (Counterfactual Explanations as Interventions in Latent Space)

Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and non-technical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, transparency, accountability, etc. Within XAI techniques, counterfactual explanations aim to provide to end users a set of features (and their corresponding values) that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations, and in particular they fall short of considering the causal impact of such actions. In this paper, we present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile. Moreover, our methodology has the advantage that it can be set on top of existing counterfactuals generator algorithms, thus minimising the complexity of imposing additional causal constrains. We demonstrate the effectiveness of our approach with a set of different experiments using synthetic and real datasets (including a proprietary dataset of the financial domain).

翻译：解释性人工智能(XAI)是一套能够理解人工智能(AI)系统的技术和非技术方面的一套技术手段。 XAI对于帮助满足人际情报(emph{可信赖}人工智能)日益重要的需求至关重要。人工智能具有基本的特征,例如尊重人的自主权、预防伤害、透明度、问责制等。在XAI技术范围内,反事实解释旨在向最终用户提供为实现预期结果而需要改变的一套特征(及其相应的价值),目前的方法很少考虑到为实现拟议解释而需要采取的行动的可行性,特别是它们没有考虑到这种行动的因果关系。在本文件中,我们提出反事实解释,作为冷冻空间的干预(CEILS),一种通过设计数据的基本因果关系产生反事实解释的方法,同时提出可行的建议,以达到拟议的结果。此外,我们的方法的好处是,它可以设置在现有的反事实发电机算法的顶端之上,从而不考虑这种行动的因果影响。我们在本文中提出反事实解释,作为干预手段,通过设计数据的基本因果关系,产生反事实解释,同时提出可行的建议,以便达到拟议的结果。我们的方法具有优势,可以将其固定在现有的反事实发电机生成器算法上,从而尽可能减少施加额外因力的复杂性数据。