Conversational question answering (CQA) facilitates an incremental and interactive understanding of a given context, but building a CQA system is difficult for many domains due to the problem of data scarcity. In this paper, we introduce a novel method to synthesize data for CQA with various question types, including open-ended, closed-ended, and unanswerable questions. We design a different generation flow for each question type and effectively combine them in a single, shared framework. Moreover, we devise a hierarchical answerability classification (hierarchical AC) module that improves quality of the synthetic data while acquiring unanswerable questions. Manual inspections show that synthetic data generated with our framework have characteristics very similar to those of human-generated conversations. Across four domains, CQA systems trained on our synthetic data indeed show good performance close to the systems trained on human-annotated data.
翻译:解答问题解答(CQA)有助于逐步和互动地了解特定背景,但由于数据稀缺问题,在许多领域很难建立CQA系统。在本文中,我们引入了一种新颖的方法,将CQA的数据与各种问题类型(包括开放、封闭和无法回答的问题)结合起来。我们为每个问题类型设计了不同的一代流,并在一个单一、共享的框架中有效地将其结合起来。此外,我们设计了一个等级分级的可答性分类(等级式AC)模块,在获得无法解答的问题的同时,提高合成数据的质量。人工检查表明,在我们的框架中产生的合成数据具有与人造对话非常相似的特征。在四个领域,我们接受过合成数据培训的CQA系统确实在与经过人类附加说明的数据培训的系统相近之处表现良好。