This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval). The core idea is to model the correlation between turn quality and the entire dialogue quality. We first propose a novel automatic data construction method that can automatically assign fine-grained scores for arbitrarily dialogue data. Then we train \textbf{SelF-Eval} with a multi-level contrastive learning schema which helps to distinguish different score levels. Experimental results on multiple benchmarks show that SelF-Eval is highly consistent with human evaluations and better than the state-of-the-art models. We give a detailed analysis of the experiments in this paper. Our code and data will be published on GitHub.
翻译:本文介绍了一个新的“自我监督精细对话评估框架”(SelF-Eval),核心思想是模拟转换质量和整个对话质量之间的关系。我们首先提出一个新的自动数据构建方法,可以自动为任意对话数据分配精细分数。然后我们用一个多层次的对比学习模型来培训\ textbf{SelF-Eval},该模型有助于区分不同的得分水平。多个基准的实验结果表明,SelF-Eval与人类评估高度一致,比最先进的模型要好。我们对本文中的实验进行详细分析。我们的代码和数据将在GitHub上公布。