解释性AI:从理论到实践 (Interpretable Representations in Explainable AI: From Theory to Practice)

Interpretable representations are the backbone of many explainers designed for black-box predictive systems based on artificial intelligence and machine learning algorithms. They translate the low-level data representation necessary for good predictive performance into high-level human-intelligible concepts used to convey the explanatory insights. Notably, the explanation type and its cognitive complexity are directly controlled by the interpretable representation, allowing to target a particular audience and use case. However, many explainers built upon interpretable representations overlook their merit and fall back on default solutions that often carry implicit assumptions, thereby degrading the explanatory power and reliability of such techniques. To address this problem, we study properties of interpretable representations that encode presence and absence of human-comprehensible concepts. We show how they are operationalised for tabular, image and text data; discuss their assumptions, strengths and weaknesses; identify their core building blocks; and scrutinise their parameterisation. In particular, this in-depth analysis allows us to pinpoint their explanatory properties, desiderata and scope for (malicious) manipulation in the context of tabular data, where a linear model is used to quantify the influence of interpretable concepts on a black-box prediction. Our findings support a range of recommendations for designing trustworthy interpretable representations; specifically, the benefits of class-aware (supervised) discretisation of tabular data, e.g., with decision trees, and sensitivity of image interpretable representations to segmentation granularity and occlusion colour.

翻译：解释性陈述是许多解释者的基础,这些解释者设计了基于人工智能和机器学习算法的黑箱预测系统,这些解释者是许多以人工智能和机器学习算法为基础的黑箱预测系统的主干,它们将良好预测性表现所需的低层次数据表述转化为高层次的人类理解性概念,用来传达解释性见解,特别是解释性表述直接控制解释性类型及其认知复杂性,从而能够针对特定受众和使用案例。然而,许多根据可解释性表述建立的解释性解释者忽视其优点,退回到往往带有隐含假设的默认解决方案,从而降低此类技术的解释性力量和可靠性。为了解决这一问题,我们研究了可解释性表述性陈述的特性,这些可解释性表述为存在和缺乏人类理解性概念。我们展示了这些可解释性的低层次数据,讨论了其假设和弱点;讨论了其假设性、优缺点和弱点;确定了其核心构件;特别是,这种深入分析使我们得以确定其解释性、可解释性、可解释性、可信任性、可理解性、可解释性、可解释性、可解释性、可解释性、可解释性、可解释性、可解释性结构结构结构结构结构分析性等一系列建议。