With the development of deep learning techniques and large scale datasets, the question answering (QA) systems have been quickly improved, providing more accurate and satisfying answers. However, current QA systems either focus on the sentence-level answer, i.e., answer selection, or phrase-level answer, i.e., machine reading comprehension. How to produce compositional answers has not been throughout investigated. In compositional question answering, the systems should assemble several supporting evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA. In this paper, we present a large-scale compositional question answering dataset containing more than 120k human-labeled questions. The answer in this dataset is composed of discontiguous sentences in the corresponding document. To tackle the ComQA problem, we proposed a hierarchical graph neural networks, which represents the document from the low-level word to the high-level sentence. We also devise a question selection and node selection task for pre-training. Our proposed model achieves a significant improvement over previous machine reading comprehension methods and pre-training methods. Codes and dataset can be found at \url{https://github.com/benywon/ComQA}.
翻译:随着深层次学习技术和大规模数据集的开发,回答问题系统迅速得到改进,提供了更准确和令人满意的答案。然而,目前的回答系统要么侧重于句级回答,即答案选择,要么侧重于句级回答,即机器阅读理解。没有全面调查如何产生构成答案。在回答组成问题的过程中,系统应该从文件中收集若干辅助证据,以产生最后答案,这比句级或句级QA更难。在本文件中,我们提出了一个大规模组成问题回答数据集,包含120公里的人标的问题。这一数据集的答案由相应文件中的不相容句组成。为了解决ComQA问题,我们提议了一个等级图形神经网络,它代表从低层次词到高层次句的文件。我们还为培训前设计了一个问题选择和不选择任务。我们提议的模型在先前的机器阅读方法和培训前方法上取得了显著的改进。Cocards and datasqours@abr/Commursqouraset。