改进通过多种参考资料增加改进对开放域对话框的自动评价 (Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation)

Multiple different responses are often plausible for a given open domain dialog context. Prior work has shown the importance of having multiple valid reference responses for meaningful and robust automated evaluations. In such cases, common practice has been to collect more human written references. However, such collection can be expensive, time consuming, and not easily scalable. Instead, we propose a novel technique for automatically expanding a human generated reference to a set of candidate references. We fetch plausible references from knowledge sources, and adapt them so that they are more fluent in context of the dialog instance in question. More specifically, we use (1) a commonsense knowledge base to elicit a large number of plausible reactions given the dialog history (2) relevant instances retrieved from dialog corpus, using similar past as well as future contexts. We demonstrate that our automatically expanded reference sets lead to large improvements in correlations of automated metrics with human ratings of system outputs for DailyDialog dataset.

翻译：对于给定的开放域域对话框背景来说,多种不同的答复往往很合理。先前的工作已经表明,对于有意义和稳健的自动评价,必须有多重有效的参考答复。在这类情况下,通常的做法是收集更多的人文书面参考资料。但是,这种收集可能费用昂贵、耗时且不易缩放。相反,我们提出了一种新技术,用于自动扩展人为生成的一组候选参考文献。我们从知识源中获取可信的参考文献,并对其进行调整,以便在相关对话框实例中更加流畅。更具体地说,我们使用(1) 一个常识知识库,从对话框中获取大量可信的反应(2) 从对话框中检索的相关案例,同时利用类似的过去和将来的环境。我们证明,我们自动扩大的参考数据集可以大大改善自动计量与Daialog数据集的系统输出的人类评级之间的关系。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。