We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such a task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and document-based context in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.
翻译:我们建议多Doc2Dial, 这是一项基于多个文件的以目标为导向的对话模式的新任务和数据集。 大多数以往的工作都把基于文件的对话模式当作基于单一文件或段落的机器阅读理解任务。 在这项工作中,我们的目标是处理更现实的情景,即以目标为导向的信息搜索对话涉及多个专题,因此以不同的文件为基础。为了便利这项工作,我们引入了一个新的数据集,其中包含基于四个不同领域的多个文件的对话。我们还探索数据集中基于对话和基于文件的环境模式。我们提出了强有力的基线办法和各种实验结果,旨在支持关于这一任务的进一步研究努力。