The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithms lack natural interpretations for general users. In this work, we introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. These latent features can be learnt, e.g. explicitly as disentangled representations or implicitly through image-to-image translation, or they can be based on any computable quantities the user chooses. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable. We benchmark our approach on synthetic data and demonstrate its effectiveness on several image-classification tasks.
翻译:随着神经网络结构及其模型中的数据日益复杂,机器学习中解释的重要性不断增长,机器学习中解释的重要性也不断增长。当模型输入特征变得高维时,就会产生独特的挑战:一方面,原则性模型-不可知性解释方法在计算上变得过于昂贵;另一方面,效率更高的解释算法缺乏一般用户的自然解释。在这项工作中,我们引入了由两个模块组成的高维数据人解释解释可解释性框架。首先,我们应用了具有词义意义的潜在代表法,既减少了数据的原始维度,又确保了数据的可解释性。这些潜在特征可以学习,例如,清晰地作为解析的表达法,或通过图像到图像翻译隐含的表达法,或者它们可以基于用户选择的任何可比较的数量。第二,我们根据模型-不可知性解释性解释性模型模型的范例来操作这些潜在特征。这导致可解释的模型解释性解释性解释性解释性解释性解释,既可以减少数据的原始维度,又可以进行精确的计算。我们将这些方法以合成数据为基准,并展示其在若干图像分类任务上的有效性。