Over the past few years, different types of data-driven Artificial Intelligence (AI) techniques have been widely adopted in various domains of science for generating predictive models. However, because of their black-box nature, it is crucial to establish trust in these models before accepting them as accurate. One way of achieving this goal is through the implementation of a post-hoc interpretation scheme that can put forward the reasons behind a black-box model's prediction. In this work, we propose a classical thermodynamics inspired approach for this purpose: Thermodynamically Explainable Representations of AI and other black-box Paradigms (TERP). TERP works by constructing a linear, local surrogate model that approximates the behaviour of the black-box model within a small neighborhood around the instance being explained. By employing a simple forward feature selection algorithm, TERP assigns an interpretability score to all the possible surrogate models. Compared to existing methods, TERP improves interpretability by selecting an optimal interpretation from these models by drawing simple parallels with classical thermodynamics. To validate TERP as a generally applicable method, we successfully demonstrate how it can be used to obtain interpretations of a wide range of black-box model architectures including deep learning Autoencoders, Recurrent neural networks and Convolutional neural networks applied to different domains including molecular simulations, image, and text classification respectively.
翻译:过去几年来,不同科学领域广泛采用了不同种类的数据驱动人工智能(AI)技术,以产生预测模型。然而,由于其黑箱性质,在接受这些模型为准确性之前,必须对这些模型建立信任。实现这一目标的一种方法是实施热后解释方案,这种方案可以提出黑箱模型预测背后的原因。在这项工作中,我们提议了一种典型的热动力学激励方法,用于此目的:AI和其他黑箱模型的热动力可解释表达法。由于这些模型具有黑箱性质,因此在将这些模型视为准确性之前,必须对这些模型建立信任。通过采用简单的前方特征选择算法,TERP为所有可能的黑箱模型预测提供解释性评分。与现有方法相比,TERP通过与古典热力模型的简单相似,从这些模型中选择最佳解释性解释性。要将TRPA作为普遍适用的模型方法,我们成功地展示了一种直线式、地方替代模型的模型,包括深层次的模型,我们成功地展示了它是如何被应用到深层次的图像网络的,包括深层次的系统。</s>