In this paper, we introduce a novel framework, SIMSEEK, (Simulating information-Seeking conversation from unlabeled documents), and compare its two variants. In our baseline SIMSEEK-SYM, a questioner generates follow-up questions upon the predetermined answer by an answerer. On the contrary, SIMSEEK-ASYM first generates the question and then finds its corresponding answer under the conversational context. Our experiments show that they can synthesize effective training resources for CQA and conversational search tasks. As a result, conversations from SIMSEEK-ASYM not only make more improvements in our experiments but also are favorably reviewed in a human evaluation. We finally release a large-scale resource of synthetic conversations, WIKI-SIMSEEK, containing 2 million CQA pairs built upon Wikipedia documents. With the dataset, our CQA model achieves state-of-the-art performance on a recent CQA benchmark, QuAC.
翻译:在本文中,我们引入了一个新颖的框架,SMSMEEK(模拟来自未贴标签文件的信息搜索对话),并比较了它的两个变体。在我们的基准SMSEEK-SYM中,一个提问者在回答者预先确定的答复时提出后续问题。相反,SMSEEK-ASYM首先提出问题,然后在谈话背景下找到相应的答案。我们的实验显示,他们可以综合用于CQA的有效培训资源和对话搜索任务。因此,SMSEEK-ASYM的谈话不仅使我们的实验有了更大的改进,而且得到了人类评估的有利审查。我们终于释放了大规模合成对话资源,WIKI-SIMSEEK, 其中包含了200万对基于维基百科文件的CQA。有了数据集,我们的CQA模型在最新的CQA基准(QuAC)上取得了最新的最新艺术表现。