One of the most pressing issues in AI in recent years has been the need to address the lack of explainability of many of its models. We focus on explanations for discrete Bayesian network classifiers (BCs), targeting greater transparency of their inner workings by including intermediate variables in explanations, rather than just the input and output variables as is standard practice. The proposed influence-driven explanations (IDXs) for BCs are systematically generated using the causal relationships between variables within the BC, called influences, which are then categorised by logical requirements, called relation properties, according to their behaviour. These relation properties both provide guarantees beyond heuristic explanation methods and allow the information underpinning an explanation to be tailored to a particular context's and user's requirements, e.g., IDXs may be dialectical or counterfactual. We demonstrate IDXs' capability to explain various forms of BCs, e.g., naive or multi-label, binary or categorical, and also integrate recent approaches to explanations for BCs from the literature. We evaluate IDXs with theoretical and empirical analyses, demonstrating their considerable advantages when compared with existing explanation methods.
翻译:近些年来,大赦国际最紧迫的问题之一是需要解决许多模型缺乏解释的问题。我们侧重于解释离散的Bayesian网络分类器(BCs),通过在解释中包括中间变量,而不是标准做法中的输入和输出变量,提高内部工作的透明度。拟议的BCs 影响驱动解释(IDXs)是利用不列颠哥伦比亚变量之间的因果关系系统生成的,这种影响被称为影响,然后根据它们的行为按逻辑要求分类,称为关系属性。这些关系属性不仅提供了超常解释方法的保障,而且允许解释所依据的信息适合特定背景和用户的要求,例如IDX可以是辩证性的,也可能是反事实的。我们证明IDXs有能力解释各种形式的BCs,例如天真的或多标签、二进制或绝对性的,并且还结合最近从文献中解释BCs的方法。我们用理论和实证分析对IDXs进行了评估,表明它们与现有解释方法相比具有相当大的优势。