Multi-hop QA with annotated supporting facts, which is the task of reading comprehension (RC) considering the interpretability of the answer, has been extensively studied. In this study, we define an interpretable reading comprehension (IRC) model as a pipeline model with the capability of predicting unanswerable queries. The IRC model justifies the answer prediction by establishing consistency between the predicted supporting facts and the actual rationale for interpretability. The IRC model detects unanswerable questions, instead of outputting the answer forcibly based on the insufficient information, to ensure the reliability of the answer. We also propose an end-to-end training method for the pipeline RC model. To evaluate the interpretability and the reliability, we conducted the experiments considering unanswerability in a multi-hop question for a given passage. We show that our end-to-end trainable pipeline model outperformed a non-interpretable model on our modified HotpotQA dataset. Experimental results also show that the IRC model achieves comparable results to the previous non-interpretable models in spite of the trade-off between prediction performance and interpretability.
翻译:多点读取 QA 多点读取 QA, 附加附加说明的辅助性事实, 是阅读理解(RC) 的任务, 考虑答案的可解释性, 已经进行了广泛的研究。 在这项研究中, 我们定义了一个可解释性读解(IRC) 模型, 作为一种能够预测无法回答的问题的管道模型。 IRC 模型证明答案预测的辅助性事实与可解释性的实际理由之间的一致性, 证明答案的预测性是有道理的。 IRC 模型检测了无法回答的问题,而不是根据不充分的信息强行输出答案,以确保答案的可靠性。 我们还建议了一条管道RC 模型的终端到终端培训方法。 为了评估可解释性和可靠性,我们进行了实验,在多点问题中考虑一个特定段落的不可回答性。 我们显示,我们的终端到终端的管道模型比我们修改过的热点QA数据集的不难解模型要好。 实验结果还表明, IRC 模型取得了与先前的不易解模式相似的结果, 尽管预测性与可解释性之间是相互偏差的。