Deep neural networks are becoming more and more popular due to their revolutionary success in diverse areas, such as computer vision, natural language processing, and speech recognition. However, the decision-making processes of these models are generally not interpretable to users. In various domains, such as healthcare, finance, or law, it is critical to know the reasons behind a decision made by an artificial intelligence system. Therefore, several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neural networks. The first direction consists of feature-based post-hoc explanatory methods, that is, methods that aim to explain an already trained and fixed model (post-hoc), and that provide explanations in terms of input features, such as tokens for text and superpixels for images (feature-based). The second direction consists of self-explanatory neural models that generate natural language explanations, that is, models that have a built-in module that generates explanations for the predictions of the model.
翻译:深神经网络由于在计算机视觉、自然语言处理和语音识别等不同领域的革命性成功而越来越受欢迎。然而,这些模型的决策过程通常不为用户所理解。在保健、金融或法律等各个领域,关键是要了解人工智能系统所作决定背后的原因。因此,最近探索了解释神经模型的几个方向。在这个论文中,我调查了解释深神经网络的两大方向。第一个方向是基于地貌的后热解释方法,即旨在解释已经受过训练的固定模型(后热)和提供输入特征解释的方法,如文本符号和图像超级像素(基于功能)。第二个方向是产生自然语言解释的自解神经模型,即具有为模型预测提供解释的人工模块的模型。