Open-domain dialogue systems aim to converse with humans through text, and its research has heavily relied on benchmark datasets. In this work, we first identify the overlapping problem in DailyDialog and OpenSubtitles, two popular open-domain dialogue benchmark datasets. Our systematic analysis then shows that such overlapping can be exploited to obtain fake state-of-the-art performance. Finally, we address this issue by cleaning these datasets and setting up a proper data processing procedure for future research.
翻译:开放域对话系统旨在通过文字与人类交流,其研究在很大程度上依赖基准数据集。在这项工作中,我们首先找出了DailyDialog和OpenSubtities这两个受欢迎的开放域对话基准数据集的重叠问题。我们系统分析后发现,这种重叠可以用来获取假的先进性能。最后,我们通过清理这些数据集和为未来研究制定适当的数据处理程序来解决这一问题。