Explainability in AI is crucial for model development, compliance with regulation, and providing operational nuance to predictions. The Shapley framework for explainability attributes a model's predictions to its input features in a mathematically principled and model-agnostic way. However, general implementations of Shapley explainability make an untenable assumption: that the model's features are uncorrelated. In this work, we demonstrate unambiguous drawbacks of this assumption and develop two solutions to Shapley explainability that respect the data manifold. One solution, based on generative modelling, provides flexible access to data imputations; the other directly learns the Shapley value-function, providing performance and stability at the cost of flexibility. While "off-manifold" Shapley values can (i) give rise to incorrect explanations, (ii) hide implicit model dependence on sensitive attributes, and (iii) lead to unintelligible explanations in higher-dimensional data, on-manifold explainability overcomes these problems.
翻译:AI的可解释性对于模型的开发、遵守规章和提供预测的操作精细度至关重要。解释性框架将模型的预测与输入特征以数学原则和模型不可知性的方式定性为数学原则。然而,“可解释性”的一般实施使得一个站不住脚的假设:模型的特征与模型的特征不相干。在这项工作中,我们展示了这一假设的明确缺陷,并制定了两个尊重数据多重的可解释性解决方案。一个基于基因模型的解决方案,提供了数据估算的灵活访问;另一个直接学习了模型的价值功能,以灵活性为代价提供了性能和稳定性。虽然“非manfacefoldy” 符号值可以(一) 产生不正确的解释, (二) 隐藏模型对敏感属性的隐含式依赖, (三) 导致在高维数据中作出不易懂的解释, 一种基于定义性的解释克服了这些问题。