Recent studies on Question Answering (QA) and Conversational QA (ConvQA) emphasize the role of retrieval: a system first retrieves evidence from a large collection and then extracts answers. This open-retrieval ConvQA setting typically assumes that each question is answerable by a single span of text within a particular passage (a span answer). The supervision signal is thus derived from whether or not the system can recover an exact match of this ground-truth answer span from the retrieved passages. This method is referred to as span-match weak supervision. However, information-seeking conversations are challenging for this span-match method since long answers, especially freeform answers, are not necessarily strict spans of any passage. Therefore, we introduce a learned weak supervision approach that can identify a paraphrased span of the known answer in a passage. Our experiments on QuAC and CoQA datasets show that the span-match weak supervisor can only handle conversations with span answers, and has less satisfactory results for freeform answers generated by people. Our method is more flexible as it can handle both span answers and freeform answers. Moreover, our method can be more powerful when combined with the span-match method which shows it is complementary to the span-match method. We also conduct in-depth analyses to show more insights on open-retrieval ConvQA under a weak supervision setting.
翻译:最近关于问答(QA)和对答 QA(ConvQA)的研究强调检索的作用:系统首先从大量收集中提取证据,然后提取答案。这种公开检索的ConvQA设置通常假定每个问题都由特定段落(一个宽度回答)中单一的文本宽度回答。因此,监督信号来自系统能否从检索的通道中找到与这一地面真相回答的准确匹配。这种方法被称为“跨匹配”监管不力。然而,信息搜索对话对于这种跨匹配方法来说具有挑战性,因为长期答复,特别是自由格式回答,不一定是任何通道的严格范围。因此,我们引入了一种学习到的薄弱监督方法,可以找出某一通道中已知答案的简写范围。我们在QuAC和CoQA数据集的实验表明,跨匹配的薄弱的主管只能用宽度回答处理对话,而对于人们生成的免费答案则不那么令人满意。我们的方法更加灵活,因为在跨行深度回答和跨行深度分析中,我们的方法可以同时展示更强有力的方法。