Interpreting the predictions of existing Question Answering (QA) models is critical to many real-world intelligent applications, such as QA systems for healthcare, education, and finance. However, existing QA models lack interpretability and provide no feedback or explanation for end-users to help them understand why a specific prediction is the answer to a question. In this research, we argue that the evidences of an answer is critical to enhancing the interpretability of QA models. Unlike previous research that simply extracts several sentence(s) in the context as evidence, we are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable. Besides, we provide effective strategies to quantitatively measure the informativeness, conciseness and readability of evidence. Furthermore, we propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability. We conduct extensive experiments on the SQuAD and TriviaQA datasets with several baseline models to evaluate the effect of GCED on interpreting answers to questions. Human evaluation are also carried out to check the quality of distilled evidences. Experimental results show that automatic distilled evidences have human-like informativeness, conciseness and readability, which can enhance the interpretability of the answers to questions.
翻译:解释现有问答模型的预测对于许多现实世界智能应用至关重要,例如保健、教育和金融的质量保证系统;然而,现有的质量保证模型缺乏可解释性,没有向最终用户提供反馈或解释,以帮助他们理解为什么具体预测是一个问题的答案;在这项研究中,我们争辩说,答案的证据对于提高质量保证模型的可解释性至关重要。与以前仅仅从背景中提取若干句子作为证据的研究不同,我们是第一个在信息、简明和可读的背景下将证据概念明确定义为支持事实的基线模型。此外,我们提供有效战略,从数量上衡量证据的可理解性、简洁和可读性。此外,我们提出增长和剪贴现证据的算法,通过贸易信息、简洁和可读性从背景中提取证据。我们只是从背景中提取了几句子作为证据,我们首先将证据概念概念定义为信息、简洁和可读性的事实。我们提出了有效战略,从数量上衡量信息、简洁性、简洁性、准确性的角度,并解读了人类实验性结论的答案。