The lack of interpretability has hindered the large-scale adoption of AI technologies. However, the fundamental idea of interpretability, as well as how to put it into practice, remains unclear. We provide notions of interpretability based on approximation theory in this study. We first implement this approximation interpretation on a specific model (fully connected neural network) and then propose to use MLP as a universal interpreter to explain arbitrary black-box models. Extensive experiments demonstrate the effectiveness of our approach.
翻译:缺乏可解释性阻碍了对AI技术的大规模采用,然而,解释性的基本概念以及如何将其付诸实施,仍然不明确。我们在本研究报告中根据近似理论提供了可解释性的概念。我们首先在特定模型(完全连接的神经网络)上实施这种近似解释,然后提议使用MLP作为通用解释员来解释任意的黑盒模型。广泛的实验证明了我们的方法的有效性。