Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.
翻译:在开放式对话研究中,将外部知识来源有效地纳入对话是一个长期存在的问题。关于开放域知识选择的现有文献有限,并且对知识来源作了某些微小的假设,以简化总体任务(Dinan等人,2019年),例如每个背景都有一个单一的相关知识句。在这项工作中,我们评估开放域对话知识选择的现有状态,表明关于数据和评价的现有方法存在缺陷。然后我们通过提出收集相关知识的新框架加以改进,并根据维基百科(WOW)系统(我们称之为WOW++)。WOW++ 平均每个对话背景有8个相关知识句,包含开放域对话知识选择的内在模糊性。我们随后将各种知识排序算法以这一扩大的数据集为基础,同时用内在评价和外部反应质量衡量方法衡量。显示使用WOW++的神经重新排序可以超越接受标准数据集培训的编级。