Hybri Dialog:信息搜索对话数据集,以表格和文字数据为基础 (HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data)

A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities. Previous work in multiturn dialogue systems has primarily focused on either text or table information. In more realistic scenarios, having a joint understanding of both is critical as knowledge is typically distributed over both unstructured and structured forms. We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions. We propose retrieval, system state tracking, and dialogue response generation tasks for our dataset and conduct baseline experiments for each. Our results show that there is still ample opportunity for improvement, demonstrating the importance of building stronger dialogue systems that can reason over the complex setting of information-seeking dialogue grounded on tables and text.

翻译：目前对话系统的一个紧迫挑战是,在以不同方式传播信息的专题上与用户成功沟通。多方向对话系统以往的工作主要侧重于文本或表格信息。在更现实的情景中,对两者的共同理解至关重要,因为知识通常分布在结构化和结构化的形式上。我们提出了一个新的对话数据集Hybri Dialogue,它由基于维基百科文本和表格的众人源自然对话组成。通过将复杂的多点问题分解成简单、现实的多点对话互动来创建对话。我们为我们的数据集提出检索、系统状态跟踪和对话响应生成任务,并为每个数据集进行基线实验。我们的结果显示,仍然有充分的改进机会,表明建立更强大的对话系统的重要性,这些对话系统可以解释基于表格和文本的信息寻求对话的复杂背景。