In this paper, we introduce a novel framework SimSeek (simulating information-seeking conversation from unlabeled documents) and compare two variants of it to provide a deeper perspective into the information-seeking behavior. We first introduce a strong simulator for information-symmetric conversation, SimSeek-sym, where questioner and answerer share all knowledge when conversing with one another. Although it simulates reasonable conversations, we take a further step toward more realistic information-seeking conversation. Hence, we propose SimSeek-asym that assumes information asymmetry between two agents, which encourages the questioner to seek new information from an inaccessible document. In our experiments, we demonstrate that SimSeek-asym successfully generates information-seeking conversations for two downstream tasks, CQA and conversational search. In particular, SimSeek-asym improves baseline models by 1.1-1.9 F1 score in QuAC, and by 1.1 of MRR in OR-QuAC. Moreover, we thoroughly analyze our synthetic datasets to identify crucial factors for realistic information-seeking conversation.
翻译:在本文中,我们引入了一个新型的SimSeek框架(模拟来自未贴标签文件的信息搜索对话),并比较其中的两种变式,以便为信息搜索行为提供更深的视角。我们首先为信息对称对话引入一个强大的模拟器,SimSeek-sym, 即询问者和回答者在相互交谈时共享所有知识的SimSyek-sym。虽然它模拟了合理的对话,但我们又朝着更现实的信息搜索对话迈出了一步。因此,我们提议SimSeek-asym假设两个代理器之间的信息不对称,从而鼓励询问者从不可获取的文件中获取新信息。在我们的实验中,我们证明SimSeek-asym成功地为两个下游任务( CQA 和 谈话搜索) 生成了信息搜索对话。 特别是SimSeek-asym 改进了基线模型, 在 QuAC 中, 1.1-1. 19 分 和 OR- QuAC 的 MRR 。 此外,我们彻底分析我们的合成数据集, 以确定现实的信息搜索关键因素。