We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.
翻译:StatCan对话数据集:通过真实意图的对话检索数据表
Translated abstract:
我们介绍了StatCan对话数据集,包括来自加拿大统计局的代理人和在线用户之间的19,379次对话。这些对话源于真实的意图,用英语或法语进行,并导致代理人检索超过5000个复杂的数据表。基于该数据集,我们提出了两个任务:(1)根据正在进行的对话自动检索相关表格,(2)在每个轮回自动生成适当的代理人响应。我们通过建立强大的基线来研究每项任务的难度。我们在时间数据分离上的实验表明,所有模型都难以推广到未来的对话,因为观察到跨两个任务的性能都有显著下降,从验证集到测试集的移动。此外,我们发现响应生成模型难以决定何时返回表格。考虑到该任务对现有模型提出了重大挑战,我们鼓励社区为我们的任务开发模型,这些模型可以直接用于帮助知识工作者为在线聊天用户找到相关表格。