Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.
翻译:对话系统评价自动化是有效开发对话系统的动力,本文件介绍了双边游戏方法,这是对话系统评价自动化的对话收集方法,讨论了现有对话收集方法的局限性:(一) 无法与不公开的系统进行比较,以及(二) 有意选择比较系统,容易作弊。实验结果显示,使用双边游戏方法进行的自动评价减轻了这两个缺点,与现有方法一样与人的主观性密切相关。